Home
Geneious User Manual
Contents
1. Root Ed Swap Siblir Color Nodes T Font Save BQ RA Sea Turtle z General g r Warbler VES 4 Ge Ge Ostrich A m Spoonbill Zoom Moa Frog Expansion m aaao Tuna F i Carp v Layout Eel Orangutan ee e Root Length Monkey z Curvature _ PET Mangabey Mouse _J Align Taxon Labels 2 Chipmunk gt Formatting Horse d Panda Y M Show Tip Labels pas Raccoon Names Y AA achat Display Node Heights Seal isplay accession o Tiger Common Name ETT Gazelle 4 Pig Dolphin 0 04 Significant Digits 4 lt Max Chars 30 15 f Ser font sizes in the tonlhar above Figure 3 11 A view of a phylogenetic tree in Geneious There are a number of options for the tree viewer 3 7 1 Current Tree If you are viewing a tree set this option will be displayed Select the tree you want to view from the list 3 7 2 General General has 3 buttons showing the different possible tree views rooted circular and un rooted The Zoom slider controls the zoom level of the tree while the Expansion slider expands the tree vertically in the rooted layout 84 CHAPTER 3 DOCUMENT VIEWERS 3 7 3 Info For a consensus tree the info box displays the consensus method used to build the tree For a topology it also shows what percentage of the original trees have the topology of the displayed tree 3 7 4 Layout This has different options depending on the layout that you
2. forward strand AATTC reverse strand G Vector Polylinker region to cut within Annotation multiple cloning site Bases to inclusive Entire sequence Candidate Enzymes Enzymes annotated on insert NotI EcoRI Enzyme set Cut vector with EcoRI and NotI vw Product insert index 689 693 1793 167 1797 1797 forward strand GAATTC ow GG GG CG reverse strand CTTAAG ep la fey fe Ley Le vector index 3202 3206 3240 C Keep fragments which are not part of the product 3244 Figure 10 3 Insert into Vector options dialog 168 CHAPTER 10 CLONING 10 3 3 Other Options The Product section of the options displays a diagram showing the ligation points in the inser tion The parts of the ligation points belonging to the vector appear in bold in this diagram Below this is a checkbox where you can choose whether to Keep fragments which are not part of the product If this box is checked a document will be created representing the fragment removed from the vector if any If the insert fragment was produced from a sequence with two restriction site annotations the fragments on either side of the restriction site annotations will also be kept 10 4 Gateway Cloning Geneious contains three operations to assist with Gateway cloning Gateway is a registered trademark of Invitrogen Corporation 10 4 1 Add AttB Sites This operation allows you to ad
3. in that alignment column 4 4 1 Pairwise sequence alignments There are two types of pairwise alignments local and global alignments A Local Alignment A local alignment is an alignment of two sub regions of a pair of sequences 21 This type of alignment is appropriate when aligning two segments of genomic DNA that may have local regions of similarity embedded in a background of a non homologous sequence A Global Alignment A global alignment is a sequence alignment over the entire length of two or more nucleic acid or protein sequences In a global alignment the sequences are assumed to be homologous along their entire length 16 Scoring systems in pairwise alignments In order to align a pair of sequences a scoring system is required to score matches and mis matches The scoring system can be as simple as 1 for a match and 1 for a mismatch between the pair of sequences at any given site of comparison However substitutions inser tions and deletions occur at different rates over evolutionary time This variation in rates is the result of a large number of factors including the mutation process genetic drift and natural selection For protein sequences the relative rates of different substitutions can be empirically determined by comparing a large number of related sequences These empirical measurements can then form the basis of a scoring system for aligning subsequent sequences Many scoring 98 CHAPTER 4
4. There are two requirements for a FASTA file to be suitable for creating a database from e The FASTA file must contain only the same types of sequence i e Nucleotide or Amino Acid e The sequences in the FASTA file must all have unique names If the file meets these requirements it will be added as a database otherwise you will be in formed of the problem Creating a database from local documents To create a BLAST database from sequences in your local documents folders first select the documents that you want Then go to Tools Add Remove Databases Add Sequence Database and select Custom BLAST from the Service drop down box Enter a name for the database and click OK 142 CHAPTER 5 CUSTOM BLAST 5 1 4 Using Custom BLAST Once you have added one or more databases they will appear under Custom BLAST in the Sequence Search database drop down These can be used in exactly the same way as the NCBI BLAST ones Click Add Remove Databases for setup custom BLAST My Database NCBI chromosome Complete genomes and chromosomes from refseq DNA dbsts GenBank EMBL and DDBJ STS Divisions DNA eny nr Protein sequences from environmental samples 44 env_nt Environmental samples DNA est Expressed sequence tags DNA est_human Human expr est_mouse Mouse e est_others Non Mouse non Human est DNA d sequence tags DNA ressed sequence tags DNA
5. gss Genome Survey Sequence DNA htgs Unfinished High Throughput Genomic Sequences DNA Month New or revised in the last 30 days AA or DNA nr GenBank RefSeq EMBL DDBJ and PDB 44 or DNA pat Patent division of GenPept 44 or DNA pdb Sequences derived from PDB structures 44 or DNA genome Reference genomic sequences DNA refseq_protein NCBI Reference Sequence Pro SwissProt Last major release of the SWISS PROT 44 ect AA wgs Whole genome shotgun sequence entries DNA Database My Database DNA v wv Add Remove Databases Megablast fast high similarity matches DNA Program 2 More Options Figure 5 3 Searching a Custom BLAST database Chapter 6 COGs BLAST COGs BLAST allows you to BLAST against the COGs database http www ncbi nlm nih gov C0G Geneious will BLAST your sequence against the COGs database identify which COG the sequence is most likely to reside in and give you information about the COG 6 1 Setting Up To set up the COGs database you first need to set up Custom BLAST on your computer see the section on Custom BLAST Once you have set up Custom BLAST you need to set up the COGs database files 6 1 1 Downloading the COGs BLAST files yourself If you want you can download or otherwise acquire the COGs BLAST database files outside of Geneious You can download them from here ftp ftp ncbi nih g
6. 95 96 CHAPTER 4 ANALYSING DATA inary sequence analysis dotplots sequence alignment and phylogenetic tree building 4 3 Dotplots A dotplot compares two sequences against each other and helps identify similar regions 14 Using this tool it can be determined whether a similarity between the two sequences is global present from start to end or local present in patches The Geneious dotplot offers two different comparison engines based on the EMBOSS dottup and dotmatcher programs The former is much faster but less sensitive than the latter More information on these programs can be found by going to http emboss sourceforge net When viewing a pairwise alignment you can activate the path which shows where the pair wise alignment runs through the dotplot Also for nucleotide comparisons you can show the reverse complement 4 3 1 Viewing Dotplots To view a dotplot in Geneious select two nucleotide or protein sequences in the Document Table and select Dotplot Viewer in the Document Viewer Panel Figure 3 8 The Dotplot Viewer allows you to zoom in and out and to customize sensitivity of the comparison If a single nucleotide or protein sequence is selected then the dotplot is also available In this case it shows a comparison of the sequence to itself The dotplot comparison of two sequences is drawn from top left to bottom right in and offers a selection of different color schemes There is also a minimap available whic
7. Back and Forward options help you move between previous views in Geneious and are analogous to the back and forward buttons in a web browser The V option shows a list of 18 CHAPTER 2 RETRIEVING AND STORING DATA previous views The other features that can be accessed from the toolbar are described in later sections The toolbar can be customized by right clicking Ctrl click on Mac OS X on it This gives a popup menu with the following options e Show Labels Turn the text labels on or off e Large Icons Switch between large and small icons e Customize which lists all available toolbar buttons Selecting deselecting buttons will show hide the buttons in the toolbar 2 1 6 Status bar Below the Toolbar there is a grey status bar This bar displays the status of the currently selected service For example when you are running a search it displays the number of matches and the time remaining for the search to finish 2 1 7 The Menu bar File Menu This contains some standard File menu items including printing and Exit on Windows It also contains options to create rename delete share and move folders and Import Export options Edit Menu Here you will find common editing functions including Cut Copy Paste Delete and Select All These are useful when transferring information from within documents to other locations or exporting them This menu also cont
8. Assembly from sequences which have been annotated in this way select Use Existing Trim Regions 4 7 CONTIG ASSEMBLY 129 Trimmed annotations can also be created manually using the annotation editing in the sequence viewer If you create annotations of type trimmed and save them then Geneious will treat them the same as ones generated automatically and they will be ignored during assembly Trimmed annotations can also be modified in this way before or after assembly Trimming options e0 Trim Ends _ Annotate new trimmed regions ignored mbly and cor Remove new trimmed regions from sequences Remove existing trimmed regions from sequences vw Trim vectors UniVec High sensitivity Minimum BLAST alignment score 16 Choose Trim primers Minimum Match Length Y Error Probability Limit 0 05 decrease to trim more Trim regions with more than a 5 chance of an error per base Maximum low quality bases 0 Maximum ambiguities 2 m Trim 5 End _ At least bp M Trim 3 End At least O bp _ Maximum length after trim 1 000 Trim excess from 3 end LJ Cancel ok Figure 4 14 Trimming options Annotate new trimmed regions Calculate new trimmed regions and annotate them the trimmed regions will be ignored when performing assembly and calculating the consen sus sequence e Remove new trimmed regions from sequences Calculate new trimmed regions and remove them from
9. Reads separated by more than 3 times their expected distance are not linked by default unless the Link distant reads setting is turned on The horizontal line between paired reads is colored according to how close the separation be tween the reads is to their expected separation Green indicates they are correct orange and blue indicate under or over their expected separation and red indicates the reads are incorrectly orientated The reads themselves can also be configured to be colored in this way if you use the Paired Distance color scheme from the general top section in the controls on the right settings The colors used and the sensitivity for deciding if reads are close enough to their expected distance can be configured from the Options link when the Paired Distance color scheme is selected You can hover the mouse of any read in a contig and the status bar will indicate the expect separation and expected separation between the reads 4 7 7 Editing Contigs Editing a contig is exactly the same as editing an alignment in Geneious After selecting the contig click the Edit button in the sequence viewer and you can modify insert and delete characters like in a standard text editor Editing of contigs is done to resolve conflicts between fragments before saving the final con sensus The normal procedure for this is to look through the disagreements in the contig as described above and change bases which
10. Status This indicates what the Agent is currently doing The status will be one of the following e Next search in x time e g 18 hours The agent is waiting until its next scheduled search and it will search when this time is reached e Searching These are shown in bold The agent is currently searching e Disabled The agent will not perform any searches e Service unavailable The agent cannot find the database it is scheduled to search This will happen if the database plugin has been uninstalled or if for example the Collabora tion contact is offline currently e No search scheduled The agent is enabled but doesn t have a search scheduled To correct this click the Run now button in the agent dialog to have it search immediately and schedule a new search Deliver To This names the destination folder for the downloaded documents This is usually your Local Documents or one of your local folders Note If you close Geneious while an agent is running it will stop in mid search It will resume searching when Geneious is restarted Also all downloaded files are stored in the destination folder and are marked unread until viewed for the first time 2 6 3 Manipulating an agent Once an agent has been set up it can be disabled enabled edited deleted and run All these options are available from within the Agents dialog e Enable or disable an agent by clicking the check box in the Enabl
11. portant that Geneious is not accessing the database when a backup is taken For example Mac users with Time Machine will have backups taken during the day but if Geneious is running when those backups are taken they will not be suitable for restoring from and Geneious likely wouldn t start if you did In that case backups taken overnight when Geneious isn t running would be fine though There is a backup button Figure 15 3 which will cause Geneious to cease working on the local database and make a zip archive You should use this regularly and the backups should be stored on another drive or can be left to general system backups safely since these are made when Geneious is in a non running state These backups can also be safely moved around including to other machines 15 1 6 Moving to another computer It is normal for IT people to move users from one computer to another while having little knowledge of the applications and data that they re moving Before you hand over your ma chine you should make a backup of your data IT may just use Explorer on Windows to move 192 CHAPTER 15 TROUBLESHOOTING er Plugins and Features Appearance and Behavior Keyboard NCBI Sequencing Data Storage Location Users user Geneious 5 1 Data Search History M Check for new versions of Geneious _ Also check for beta versions of Geneious Check for updates now M Enable Geneious Pro days Max memory availabl
12. 194 CHAPTER 15 TROUBLESHOOTING e Use browser connection settings This allows Geneious to automatically import the proxy settings This may not work with all web browsers e Use HTTP proxy server This enables two text fields Proxy host and Proxy port This information is in your browser s connection settings Use this if your proxy server is an HTTP proxy server Please see step 3 e Use SOCKS proxy server Autodetect Type This enables two text fields Proxy host and Proxy port This information is in your browser s connection settings Use this if your proxy server is a SOCKS proxy server Please see step 3 e Use auto config file This enables one text field called Config file location These details can also be found in your browser s settings 6 Set the proxy host and port settings under the General tab to match those in your browser 7 If your proxy server requires a username and password you can specify these by clicking the Proxy Password button directly below Note If you are using any other browser and cannot find the proxy settings please use the Support Button in the Geneious toolbar to contact Geneious support Tools Preferences Preferences Settings Internet Advanced Advanced Advanced Options Proxies Network Network Connections Settings Change proxy Lan Settings settings Figure 15 4 Checking browser settings 15 2 2 Web links inside Geneious don t work under Linux Se
13. Bootstrap Random Seed Number of replicates Create Consensus Tree O Sort Topologies Support Threshold 50 Topology Threshold o 2 as C Save raw trees Cletowbeauis nee E Figure 4 4 Tree building options in Geneious 4 5 BUILDING PHYLOGENETIC TREES 109 Tree building from an alignment If you are building a tree from an alignment the following options are seen in the tree window If you select a tree document which contains an alignment then the alignment will simply be extracted from the tree and used in the tree building process Genetic distance model This lets the user choose the kind of substitution model used to estimate branch lengths If you are building a tree from DNA sequences you have the choices Jukes Cantor HKY and Tamura Nei If you are building a tree from amino acid sequences you only have the option of Jukes Cantor distance correction Tree building method There are two methods under this option Neighbor joining 20 and UPGMA 15 Create consensus via resampling Check this box to build a consensus tree using resampling of sequence alignment data Resample tree Check this to perform resampling Resampling method Either bootstrapping or jackknifing can be performed when resam pling columns of the sequence alignment Number of samples The number of alignments and trees to generate while resampling A value of at least 100 is recommended C
14. Chapter 14 Administration 14 1 Default data location By default the data location will be in the user s home directory You can change this by setting an environment variable which will be used by the Geneious launcher such as setting a SHOMES variable to be where you want a user to store their data On Windows and Linux edit the Geneious in use vmoptions file in the installation di rectory and add DdataDirectoryRoot HOMES Geneious on a new line after the other settings On Mac OS X edit the Applications Geneious app Contents Info plist and find the lt key gt Argument s lt key gt section to match the following lt key gt Arguments lt key gt lt string gt distributionVersion DdataDirectoryRoot SHOMES Geneious lt string gt A special JAVA_US E R HOM E variable is normally used which resolves to user home and is what Geneious uses by default The program will create a Geneious 7 0 Data folder inside the directory you specify 14 2 Change default preferences 14 2 1 Change preferences within Geneious Start a fresh copy of Geneious set it up the way you want Shut down and then copy Geneious 185 186 CHAPTER 14 ADMINISTRATION 7 0 Data user_preferences xml to the Geneious install directory e g C Program Files Geneious on Windows XP and rename it to default user preferences xml Now when users start Geneious for the first time they will get the configuration yo
15. S T C Y N Q Red Polar acidic D E Blue Polar basic K R H 3 2 5 General Options Aj Contains the color options see above check boxes to turn on and off main aspects of the sequence view and options for what to display as the name of each sequence 3 2 6 Display Options Consensus These options are available when viewing alignments When checked the viewer displays the consensus sequence with the aligned sequences The consensus sequence has the same length including only untrimmed bases and shows which residues are conserved are always the same and which residues are variable A consensus is constructed from the most frequent residues at each site alignment column so that the total fraction of rows represented by the selected residues in that column reaches at least a specified threshold IUPAC ambiguity codes such as R for an A or G nucleotide are counted as fractional support for each nucleotide in the ambiguity set A and G in this case thus two rows with R are counted the same as one row with A and one row with G When more than one nucleotide is necessary to reach the desired threshold this is represented by the best fit ambiguity symbol in the consensus for protein sequences this will always be an X In the case of ties either all or none of the involved residues will be selected Hence an alignment column with only A s and G s in equal number will be represented as an R in the consensus sequen
16. When viewing alignments or assemblies this gives the average percent identity over the alignment This is computed by looking at all pairs of bases at the same column and scoring a hit one when they are identical divided by the total number of pairs Ambiguity charactres are interpreted meaning a nucleotide A vs a nucleotide R is considered to have 50 identity Confidence mean When viewing chromatograms this gives the mean of the confidence scores for the currently selected base calls Confidence scores are provided by the base calling pro gram not Geneious and give a measure of quality higher means a base call is more likely to be correct An untrimmed value is also displayed if the selected region contains trims Expected Errors When viewing chromatograms this gives the approximate number of errors that are statistically expected in the currently selected region This is calculated by converting the confidence score for each base call in to the error probability and summing across the region This also has a value for the untrimmed selection if the region contains trims Ungapped Lengths of Sequences Displays the mean standard deviation minimum and maxi mum of the lengths of the sequences Coverage of Bases When viewing a contig assembly this gives the mean standard deviation minimum and maximum of the coverage of each base in the consensus sequence If your con tig has a reference sequence then the percentage of the unga
17. and place them together in a folder If you make a page called index html it will be treated as the main page Geneious will follow all hyperlinks between the pages and external hyperlinks beginning with http will be opened in the user s browser If you want to include figures and diagrams in the pages just put the image files in the folder and reference them with lt img gt tags like a normal HTML document supported image formats are GIF JPG and PNG If you want to include Geneious documents in your tutorial simply place them in the folder as above and they will automatically be imported into Geneious with the tutorial If you want to link to them from the tutorial pages create a hyperlink pointing to the file in the HTML document For example to create a link to the file sequence fasta in your tutorial folder use the HTML lt a href sequence fasta gt click here lt a gt To open more than one document from a link separate the filenames with the pipe character for ex ample lt a href sequence fasta sequence2 fasta gt click here lt a gt Note that geneious files must contain only one document to be imported automatically with the tutorial You can add a short one line summary by writing your summary in a file called summary txt case sensitive and putting it in the tutorial folder Make sure that the entire summary is on the first line of the file as all other lines will be ignored Once you ha
18. be shown 2 9 PREFERENCES 55 0 0 0 Constraints for is greater than is less than is greater or equal is less or equal OK Clear Constraints Cancel Figure 2 17 The Edit Constraints window Sorting Any meta data fields added to documents will also appear as columns in the Docu ment Table These new columns can be used to order the table 2 9 Preferences You can access the preferences screen in two ways 1 Shortcut keys Ctrl Shift P Windows Linux Shift P Mac OS X 2 Select the Tools Menu and click Preferences There are several sections in the preferences window which are presented as tabs The most important of these are described below 2 9 1 General This contains connection settings data storage details for your local documents automatic new version checking and a Search History Check for new versions of Geneious Enable this to have Geneious check for the release of new versions everytime it is started If a new version has been released Geneious will tell you and give you a link to download it Also check for beta versions of Geneious Enable this to also have Geneious alert you when new beta versions are released A beta version is a version that is released before the official release for the purposes of testing It may therefore be less stable than official releases 56 CHAPTER 2 RETRIEVING AND STORING DATA Max memory available to Geneious allo
19. gt XO m Operations Document Collaboration Y NCBI View E Gene 1 10 20 30 Genome Consensus MNORIMNGAMG GG GAAIGGAGH NNIGGGENTE X EJ E Nucleotide ero O lt A _I 2 PopSet Ce 1 Adam TTCTTTOBTEGG GAAGCAGA TTTGGGTA CCIGA 3 Protein De 2 Harry TTCTTTCATGGG GAAGCAGA TTGGGTACC A PubMed De 3 Sally TTCTTTCATGGGBEEGAAGCAGA TTTGGGTACC A E snp Ce 4 Bob TTCTTTCATGGG GMAGA TTTGGGTACC A De S Jane TTCTTTCATGGGGEENMAGGCAGAMTTTGGGTACC A Structure P Taxonomy v P Pfam Not set up E Domains UniProt aa Alt click le click on a sequence position or annotation or select a region to zoom i T Using 77 1044 MB memory 4 p 9 CI Lei Search Tuto Help Alignment View Help The sequence view is a highly customizable viewer for protein and nucleotide sequence Zoo y The sequence view lets you zoom in to view individual residues or zoom out to view an entire sequence and all its annotations Buttons for controlling zoom are positioned at the top of the options panel on the right of the sequence view You can also hold Alt or Ctrl and turn the mouse wheel up down to zoom in out or Alt click to zoom in or Alt Shift click to zoom out Selecting and Editing Selection and editing in the sequence viewer is very similar to standard text editing and word processing programs Click and drag to select a region You can drag up and down to select and edit across mu
20. or Cancel to abort 10 2 Digest into fragments The option Digest into fragments from the Tools Cloning menu or the context menu allows you to generate the nucleotide sequences that would result from a digestion experiment You can digest multiple nucleotide sequences at a time If the digestion results in overhangs these will be recorded as annotations on the fragments e If you have selected only one nucleotide sequence document and it has annotated re striction sites you can select Digest using Annotated cut positions to cut the document on these sites When this option is selected the options to filter the enzymes by their effec tive recognition sequence length or number of hits are disabled However if you select a subset of the enzymes under More Options only the cut sites from these enzymes will 164 CHAPTER 10 CLONING Find Restriction Sites Candidate Enzymes Commercially Available Enzymes 632 w Minimum effective recognition sequence length nucleotides L Only include enzymes that match 1 to 2 times Exclude enzymes cutting between residues 23418 1264 A v 629 enzymes selected Recognition Effective Len Overhang TTA TAA 6 0 blunt 1 per match CACCTGC 4 8 7 0 5 4 nucleotides 1 per match GACNNNN NN 6 0 3 2 nucleotides 1 per match AGG CCT 6 0 blunt 1 per match GACGT C 6 0 3 4 nucleotides 1 per match CC TCGAG
21. server is encrypted and that we do not log or share your data If you wish to set up and run your own Jabber server we recommend using Openfire from Ig nite Realtime http www igniterealtime org projects openfire index jsp which is available for free under the Apache 2 0 Open Source License ht tp www apache org licenses LICENSE 2 0 html Install and start the server on one computer and then enter that computer s name or address in the Server field under More Options when cre ating a new account Please note that Biomatters cannot provide any further support for setting up and managing your Jabber server except possibly under a contracting agreement Chapter 10 Cloning Restriction Enzymes cut a nucleotide sequence at specific positions relative to the occurrences of the enzyme s recognition sequence in the sequence For example the enzyme EcoRI has the recognition sequence GAATTC and cuts both the strand and the antistrand sequence after the G inside the recognition sequence leaving a single stranded overhang sticky end overhang GAATTC CTTAAG The cloning features in Geneious allow you to identify candidate Restriction Enzymes for your experiments and to determine in silico where they would cut your nucleotide sequences and which fragments they would produce It also lets you ligate fragments and insert a fragment into a vector If you select a nucleotide sequence restriction analysis i
22. 1 Relative substitution rates define the rate at which each of the transitions A G C gt T and transversions A C A e T C gt G G e T occur in an evolving sequence It is represented as a 4x4 matrix with rates for substitutions from every base to every other base Additionally gaps are not penalized when using the Geneious Tree Builder Comparisons in volving any gaps are ignored when calculating the distance matrix 106 CHAPTER 4 ANALYSING DATA Jukes Cantor This is the simplest substitution model 11 It assumes that all bases have the same equilibrium base frequency i e each nucleotide base occurs with a frequency of 25 in DNA sequences and each amino acid occurs with a frequency of 5 in protein sequences This model also assumes that all nucleotide substitutions occur at equal rates and all amino acid replacements occur at equal rates HKY The HKY model 9 assumes every base has a different equilibrium base frequency and also assumes that transitions evolve at a different rate to the transversions Tamura Nei This model also assumes different equilibrium base frequencies In addition to distinguishing between transitions and transversions it also allows the two types of transitions A gt G and C gt T to have different rates 22 4 5 5 Resampling Bootstrapping and jackknifing Resampling is a statistical technique where a procedure such as phylogenetic tree building is repeated on a series o
23. 1 of 5 1 Cys peroxiredoxin protein CDS 0 5 gt i M cps 1 gt lt gt CAGGCACAAAGCCGG TTGCCACCCCAGTTGACAGGAAG W O Exon 2 a y lt gt T EE TK P EEA T P EDR K O mRNA 1 mm lt gt 1 300 1 310 1 320 Taso Source 1 A A GTAAAACTGCTCTTAAGAACTGGATGCCCAGCTIGCCA 4 hA Alt click on a sequence position or annotation or select a region to zoom in Alt shift click to zoom out Figure 3 4 Translating a CDS Translations can be synchronized between sequences in an alignment with reference to the individual sequences the alignment the consensus or a specific reference sequence Figure 3 5 shows an example of a DNA alignment coloured by the amino acid translation 3 2 7 Graphs This option is visible when viewing protein sequences chromatogram traces multiple se quences or sequence alignments Turn this option on by clicking the Graph checkbox and the graph s will be displayed below the sequence s The number control to the right of each graph controls the height of that graph in pixels A number of graphs are available Protein Coding Prediction This is available with nucleotide sequences It runs the EMBOSS 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 69 Alignment View History Text View Notes 1 D E Extract GRC E Translate GF Allow Editing Add Edit Annotation B Annotate amp Predict El Save E a A e AE Display C Consensus 0 Majority El C Ignor
24. 10 5 Gibson Assembly o ee uce ee ha oe IN Oe BERS amp 2S 10 6 TOPO Cloning 11 Shared Databases 11 1 Supported Database Systems 2 ba be RR Ree KELME ORES 11 2 Setting up 113 Removme a Shared Database once ck hee Ee Pe bee SR RE ERR EO RE 11 4 Administration 12 Licensing 147 148 149 151 151 152 153 154 156 158 158 159 161 162 163 165 168 169 171 173 173 174 175 175 177 12 1 12 2 123 12 4 12 5 AcUvVAte License a r o kk be a Install PLEXNG si A ews Borrow Floating License ca id ee ee Release License 5524084 66 8845 40 84s DIF ss o oe ee OES ee eS 13 Geneious Server 13 1 Introduction to Geneious Server 13 2 13 3 Running jobs and retrieving results 13 4 Geneious Server enabled plugins Accessing Genelous Server ss ee ees 14 Administration 14 1 14 2 14 3 Specify license server location 14 4 14 5 Default data location 04 Change default preferences lt lt Deleting PINGING lt a o coo be we Bae ares ss lt co beeen ead Eee ow 8 15 Troubleshooting 15 152 155 15 4 15 3 15 6 15 7 15 8 Local database issues coro oe es Network issues 00 0008 ae Geneious is SIOW 0 ee a es Importing and exporting data BLAST issues o ose sacana da Bw Ho eee POMES e 2 2 Se e ok ce ee Bs Ge oe a Assembler ocioso ee Re ee Installation and Licensing CONTENTS Chapter 1 Gett
25. 70 CHAPTER 3 DOCUMENT VIEWERS tcode tool and tests DNA sequences for protein coding regions using an algorithm which looks for simple and universal differences between protein coding and noncoding DNA The program slides a window of user selectable size over the DNA sequence For each window the TESTCODE statistic is applied The output graph indicates coding regions green and noncoding regions red Chromatogram This is available with chromatogram traces It displays the four traces above the sequence where the peak as detected by the base calling program is at the middle of the base letter When viewing more than one chromatogram or an alignment made from chro matograms each chromatogram can be turned on or off individually using the checkbox s below Note that since the distance between bases as inferred from the trace varies the trace may be either contracted or expanded compared with the raw data The vertical scale of the chromatogram can be adjusted by clicking and dragging on the graph itself The total height of the graph can be adjusted by increasing the number displayed next to the graph on the right of the Sequence View Coverage This is available on sequence alignments and contigs The height of the graph at each position represents the number of sequence which have a non gap character at that position If the selected contig was created using Geneious and it contains sequences in both directions then color coding is used to
26. DNA Fold Text View Info lt a ay Er Extract GRC BD Translate Add Edit Annotation VF Allow Editing Annotate Predict Save ie 1 50 100 150 200 250 300 350 r A 2 12 14 J aj source Mus musculus M Show Annotations 5 rAd a 5 M cos 1 D lt gt al M Exon 2 o y lt gt ry M mRNA 1 mu gt M Source 1 mu lt gt amp 750 800 850 900 950 1 000 1 050 mn OS mi Track 4 a Pop stirs Name Type w exon 4 exon o 1 Cys peroxiredoxin CDS 1 Cys peroxiredoxin mRNA exon 3 exon kas 100 1 150 1 200 1 250 1 300 1 350 1 400 source Mus musculus source 4 Length 1 596 1 450 1 500 1 550 1 596 a 209 C 342 21 41 G 370 23 2 Mi 471 29 58 A Alt click on a sequence position or annotation or select a region to zoom in Alt shift click to zoom out Figure 3 1 A view of an annotated nucleotide sequence in Geneious 3 2 1 Zoom level The plus and minus buttons increase and decrease the magnification of the sequence by 50 or by 30 if the magnification is already above 50 zooms in to fit the selected region in the available viewing area P zooms to 100 The 100 zoom level allows for comfortable reading of the sequence Ra 4 zooms out so as to fit the entire sequence in the available viewing area Zooming can also be quickly achieved by holding down the zoom modifier key which is the Ctrl key on Windows Linux or the Alt O
27. EMBL DDBJ PDB sequences no EST STS GSS or HTGS sequences genome Genomic entries from NCBI s Reference Sequence project est Database of GenBank EMBL DDBJ sequences from EST Divisions est human Human subset of est est_mouse Mouse subset of est est_others Non Human non mouse subset of est gss Genome Survey Sequence includes single pass genomic data exon trapped sequences and Alu PCR sequences htgs Unfinished High Throughput Genomic Sequences phases 0 1 and 2 finished phase 3 HTG sequences are in nr pat Nucleotide sequences derived from the Patent division of GenBank PDB Sequences derived from the 3D structures of proteins from PDB month All new updated GenBank EMBL DDBJ PDB sequences released in the last 30 days RefSeq NCBI curated non redundant sets of sequences dbsts Database of GenBank EMBL DDBJ sequences from STS Divisions chromosome A database with complete genomes and chromosomes from the NCBI Reference Sequence project wes A database for whole genome shotgun sequence entries env_nt This contains DNA sequences from the environment i e all organisms put together Table 2 3 Protein sequence searches in the BLAST databases Database Protein searches env_nr Translations of sequences in env_nt month All new updated GenBank coding region CDS translations PDB SwissProt PIR released in last 30 days nr All non redundant GenBank coding region CDS translations PDB SwissProt PIR PRF pat Protein sequence
28. Newick format is commonly used to represent phylogenetic trees such as those inferred from multiple sequence alignments Newick trees use pairs of parentheses to group related taxa separated by a comma Some trees include numbers branch lengths that indicate the distance on the evolutionary tree from that taxa to its most recent ancestor If these branch lengths are present they are prefixed with a colon The Newick format is produced by pro grams such as PHYLIP PAUP ClustalW 24 ClustalX 23 Tree Puzzle 8 and PROTML Geneious is also able to read trees in Newick format and display them in the visualization win dow It also gives you a number of display options including tree types branch lengths and labels Nexus format The Nexus format 13 was designed to standardize the exchange of phylogenetic data in cluding sequences trees distance matrices and so on The format is composed of a number of blocks such as TAXA TREES and CHARACTERS Each block contains pre defined fields Geneious imports and exports files in Nexus format and can process the information stored in them for analysis If you want to export a tree in a format that preserves bootstrap values for example Nexus is the choice Make sure you export with metacomments enabled though otherwise the boot straps will be lost PDB format Protein Databank files contain a list of XYZ co ordinates that describe the position of atoms in a protein These are the
29. PCR Product Today at 3 45 PM O psB1C3 E Terr repressiple GFP generator inserted into pSB1C3 PCR Produc TetR repressiple GFP generator Legend Selected Document Active Link Inactive Link Figure 3 18 The Lineage View akin to Vector NTI You can also choose to view only inactive links by unchecking the Show Inactive Links checkbox This will hide all inactively linked documents as well as those documents parents or descendants This means that you will only be viewing documents that are directly affected by one currently being viewed You can reactivate temporarily deactivated links from the view by right clicking Windows Linux or control clicking MacOS on a document and choosing Activate link to parent from the context menu Alternatively you can reactivate links to all children at once by choosing Show Operations and right or control clicking then selecting Reactivate all links for this operation You may also manually deactivate links in this fashion Figure 3 19 Sequence View Annotations Dotplot Self Virtual Gel DNA Fold Enzymes Fragments Text View info e3 i a Properties wy Show Operations A Show Inactive Links Goto Export History Lineage Parents Descendants O Terr repressiple GFP generator inserted into pSB1C3 O Terr repressiple GFP generator inserted into pSB1C3 v Restriction Cloning Today at 3 43 PM v BB Extract PCR Product Today at 3 45 PM O psB1
30. T to the left of the 46 CHAPTER 2 RETRIEVING AND STORING DATA Match all E of the following Any Field a contains E Document type geneious document type E Value First Author The article principal author tluc GID The ank ID of the sequence Height T eight Hit range P on of hit in result sequence Journal Title Where article v ublished Last Author The article author Medline Date The arti iblication date Molecule Type The molecule type of the sequence Name name of a document No nodes No tips Number of tips Organism Organism The organism of the sequence PDB name Name in PDE PMID The PubMed ID of the article ber of Nodes Sequence Annotations seq Sequence Length Residue lenc Sequence Residues Size Size of the document Summary summ d URL A url link to the publis d article v T Select a document from the pane Figure 2 11 Searching the local documents on a user defined field search dialog select Nucleotide similarity search or Protein similarity search and enter the sequence text Geneious will try to guess the type of search based on the text so that simply entering or pasting a sequence fragment may change the search type automatically The search locates documents containing a similar string of residues and orders them in de creasing order of similarity to the string The ordering is based on calculating an E value for e
31. The number of reads that cover the SNP region in the contig The coverage includes both the reads containing the SNP and other reads at that position Reference Frequency The percentage of reads that agree with the reference sequence at that position This field will only be present if at least 1 read agrees with the reference sequence e Variant Frequency The percentage of reads that have the variation at that position For variations that span more than a single nucleotide the variant frequency may appear as a range e g 47 8 51 7 to indicate the minimum maximum variant frequency over that range Polymorhpism Type This may be one of the following SNP Transition a single nucleotide transition change from the reference sequence SNP Transversion a single nucleotide transversion change from the reference sequence SNP At a single position there are multiple variations from the reference sequence Substitution A change of 2 or more adjacent nucleotides from the reference sequence Insertion 1 or more nucleotides inserted relative to the reference sequence Deletion 1 or more nucleotides deleted relative to the reference sequence Mixture multiple variations from the reference sequence which are not all the same length e Change Indicates the reference sequence nucleotides followed by the variant nucleotides For example C A For variations inside coding regions CDS annotations the following fields may be prese
32. a floating license you can release it allowing another user to access it without you having to shut Geneious down Once you ve released the license Geneious will enter restricted use mode 12 5 Buy Online This item will open the Geneious store in your browser Chapter 13 Geneious Server 13 1 Introduction to Geneious Server If your site has a Geneious Server installed you can use it to offload many of the tasks that Geneious would normally run locally on to the server taking the processing load off your own computer Once a job is sent to Geneious Server it will either be processed on the server itself a so called standalone installation or be handed off to a cluster running Oracle Grid Engine LSF or PBS schedulers To use Geneious Server a server side user account is required The server side user account will have a server access license associated with it Another possible configuration is that your server may have a queue licencing system which allows a certain number of users to run jobs on Geneious Server simultaneously If your user account has its own access license GSAL then you can connect to the server and execute jobs immediately without having to wait for a queue license to become available If your account doesn t have an access license then you can log in and submit the job to the server where it will join the queue and execute when a queue license becomes available 13 2 Accessing Geneious Server Assuming
33. also do a multiple alignment via translation and back as with pairwise alignment 4 4 3 Sequence alignment using ClustalW ClustalW is a widely used program for performing sequence alignment 24 23 Geneious al lows you to run ClustalW directly from inside the program without having to export or import your sequences If you do not have ClustalW or are unsure if you do you should attempt to perform a ClustalW alignment without specifying a location Geneious will then present you with options includ ing details on how to download ClustalW and will offer to automatically search for ClustalW on your hard drive 4 4 SEQUENCE ALIGNMENTS 103 To perform an alignment using ClustalW select the sequences or alignment you wish to align and select the Align Assemble button from the Toolbar and choose Multiple Alignment At the top of the alignment options window there are buttons allowing you to select the type of alignment you wish to do Choose ClustalW here and the options available for a ClustalW alignment will be displayed The options are e ClustalW Location This should be set to the location of the ClustalW program on your computer Enter the path to it in the text field or click the Browse button to browse for the location If the location is invalid and you attempt to perform an alignment Geneious will tell you and offer the options detailed above for getting or finding ClustalW e Cost Matrix Use this to
34. and you will then see primer specific options and Characteristics as in Figure 4 11 Changing the primer binding site position in the Add annota tion window will automatically update the primer sequence and characteristics A 5 extension can also be added directly onto a primer in this step by clicking the button next to Extension See section 4 6 5 for more information on adding 5 extensions 120 ATGTTATTGTCACAG MammothCOX1_F _ CHAPTER 4 ANALYSING DATA 5 520 5 530 5 540 5 553 5 560 5 570 CGCCTTTGTAATAATCTT CTT TATAGTTATG CCAATTATAATTGGAGGCTTTGGAAACTGA COX1 CDS x Name MammothCOX1_F Type Primer Bind primer_bind Created by primer3 Length 20 Interval 5 498 gt 5 517 Product Size 250 Mismatches 0 YGC SO 0 Tm 60 0 Hairpin Tm 39 8 Self Dimer Tm None Pair Dimer Tm None Sequence TATTGTCACAGCACACGCCT Figure 4 10 Primer annotation with 5 extension 4 6 PCR PRIMERS 121 Type primer_bind y Track No Track izj Direction e Forward Reverse Undirected Binding site 8 to o Primer Sequence CCATGGCCCTGTGGATGC Extension Characteristics Length 18 Hairpin Tm None Tm 60 8 Self Dimer Tm 12 3 GC 66 7 ft gt Properties Figure 4 11 Create a primer by adding a primer annotation 122 CHAPTER 4 ANALYSING DATA 4 6 7 Importing primers from a spreadsheet You can import primers and probes directly into Geneious from Co
35. consider down loading all databases except Pfam A full 7 2 Pfam Document Types There are three special document types used for Pfam data p gt 3 Pfam sequence documents are based on UniProt sequences They contain all the informa tion from the UniProt sequence plus information on the Pfam domains in the sequence You can view the domains as annotations in the sequence view or on their own from the domain view Domain documents contain information about Pfam A full Pfam A seed and Pfam B do mains This includes general information about the domain references visible in the reference view and the alignment for the domain I Clan documents contain information about a clan including general information refer 7 3 PFAM OPERATIONS 149 ences visible in the reference view and a list of the domains which are members of this clan 7 3 Pfam Operations There are a number of special operations available to Pfam documents and UniProt sequences To take advantage of these operations you will need to have the Pfam databases set up The following Pfam operations are available e Create Pfam Sequence creates a Pfam sequence document from a UniProt sequence You can view the domain information in a Pfam sequence document using the Domain Viewer This operation can take a long time e With Find Similar Sequences you can search and create documents for sequences in UniProt which match the domain architecture of your Pfam sequenc
36. creates a new alignment document with some columns for example all identical columns or all columns containing only gaps stripped Concatenate Sequences or Alignments Joins the selected sequences or alignments end on end creating a single sequence or alignment document from several After selecting this operation you are given the option to choose the order in which the sequences or alignments are joined You can also choose whether the resulting document is linear or circular and if one or more of the component sequences was an extraction from over the origin of a circular sequence you can choose to use the numbering from that sequence thus producing a circular sequence with its origin in the same place as the original circular sequence Overhangs will be taken into account when concatenating Generate Consensus Sequence Generates a consensus sequence for the selected se quence alignment and saves it to a separate sequence document After selecting this operation you are given options for choosing what type of consensus sequence you wish to generate see section 3 2 6 for more details on the options Plugins Jump directly to the plugins preferences Preferences see section 2 9 2 1 8 Sequence Menu This contains several operations that can be performed on Protein and Nucleotide sequences as well as Sequence Alignments in some cases New Sequence create a new nucleotide or protein sequence from residues
37. displaying N for sites that contain gaps and non gaps Go to next disagreement agreement transition transversion ambiguity goes to the next highlighted feature as described in the previous section on highlighting Highlighting can be applied with reference to the consensus or a selected reference sequence 68 Reverse Complement and Translation CHAPTER 3 DOCUMENT VIEWERS When viewing nucleotide sequences Geneious offers reverse complement and protein transla tion options Translations can be selected per reading frame using a range of genetic codes They can also be created relative to selection or annotations such as CDS Figure 3 4 G D gt Genta ation Trans ransiation anslation ransiation house mouse Tr ar ation house mouse house mouse house mouse house mouse Translation Sequence View Dotplot Self Annotations History Text View Notes ct ERC 6 Translate LF Allow Editing Add Edit Annotation Annotate amp Predict MJ Save H a 612 70x JE 1 150 1 160 1 170 TACAGGTGTTCATTTTTGGCCCTGACAAGAAACTGAAG C Complement CA EE rF E F E Pp eK CK dK 1 Cys peroxiredoxin protein CDS Translation 1 180 1 190 1 200 1 210 gt rer alll CTGTCTATCCTCTACCCTGCCACCACGGGCAGGAACTT PC andi 1 220 1 230 1 240 1 250 Colors ARND 4 hd TGATGAGATTCTCAGAG TGGTTGACTCTCTCCAGCTGA c DHE R D Ss MEN O My M Show Annotations
38. for biologist programmers In Krawetz S Misener S eds Bioinformatics Methods and Protocols Methods in Molecular Biology Humana Press Totowa NJ pp 365 386 Source code available at http sourceforge net projects primer3 Further information on the functionality of the primer design feature can be found in the primer3 documentation available here http primer3 ut ee primer3web_help htm Please note that some controls have been changed renamed or removed from Geneious but most of the primer3 functionality is available 4 7 Contig Assembly Contig assembly or sequence assembly is normally used to merge overlapping fragments of a DNA sequence into a contig which can be used to determine the original sequence The contig essentially appears as a multiple sequence alignment of the fragments After some manual 124 CHAPTER 4 ANALYSING DATA editing of the contig to resolve disagreements between fragments which result from read errors the consensus sequence of the contig is extracted as the sequence being reconstructed Contig assembly is also used to align a large number of reads of the same sequence from different individuals This is done to find small differences between reads or SNPs Single Nucleotide Polymorphisms In this type of analysis the consensus sequence of the contig is not the interesting part the differences between fragments is This can also be done against a known reference sequence when differences between eac
39. format you can install it by clicking this button or by dragging the plugin file in to Geneious e Check for plugin updates now Checks if there are any new versions available for the plugins you have installed e Automatically check for updates to installed plugins If checked Geneious will check for new versions of your installed plugins each time the program is started e Tell me when new plugins are released Changes the way the program notifies you about new plugin releases e Also check for beta releases of plugins Plugins are sometimes initially released as a beta for the purposes of testing before the officially release Check this to be notified about the release of beta plugins e Customize feature set Click this to see a list of all features in Geneious Any number of these can be turned off by un checking the Enabled box next to each feature You might like to turn of the Tree Builder and Tree Viewer plugins if you don t do phylogenetics for example 2 9 PREFERENCES General Appearance and Behavior Keyboard NCBI Sequencing A Plugins Plugins are downloadable modules which add new functionality to Geneious Available Plugins 4 Description Categories W Green Button Run analyses on the NZSC supercomputing cluster from Supercomput Install Info PhyML Maximum Likelihood tree building for alignments Please cite Phylogenetics Install Info Installed Plugins Description 4 Catego
40. in a non standard location If you want to access your data from multiple computers these are not the way to do it e Don t store local database on a network drive e Don t use a tool like DropBox to sync the database Storing data on a network drive can lead to very poor performance because Geneious accesses the database frequently so we do not recommend this A typical problem would be documents that don t show up in the document table immediately or changes to documents don t persist Windows Vista and 7 have also had issues where they change ownership of documents when 189 190 CHAPTER 15 TROUBLESHOOTING accessed from other machines and this prevents the user from changing them from a different login Storing data on a syncronising service is not recommended because the changes to the Geneious database need to be completely copied to the remote service for it to remain intact Since out going connections can be quite slow it is too easy for the sync to be cut short and then when the other computer tries to sync with the remote service the local database is corrupted Users who must access data from multiple places should use e A USB drive that they can put documents on in geneious format which can then be dragged into another local database on another machine In theory you could put your entire local database on the drive but this could result in permissions issues mentioned earlier so isn t recommended e Putthe geneious files
41. key while the cursor is in the search text field To initiate a search enter the desired search term s in the text field and press enter or click the adjacent Search button Once a search starts the results will appear in the document table as they are found The Search button changes to a Cancel button while a search is in progress and this may be clicked at any time to terminate the search Feedback on a search progress is presented in the status bar directly below the toolbar see Figure 2 4 2 3 1 Advanced Search options To access advanced search click the More Options button inside the basic search panel To return to basic search click the Fewer Options button Switching between advanced and basic will not clear the search results table 2 3 SEARCHING 33 immunodeficiency C Search More Options Name Summary R Ed NC_001802 Human immunodeficiency virus 1 complete genome Ed NC_004455 Simian immunodeficiency virus 2 complete genome Ed NC_001549 Simian immunodeficiency virus complete genome Ed NC_003074 Arabidopsis thaliana chromosome 3 complete sequence Ed NC_002305 Salmonella typhi plasmid R27 complete sequence a NC_001870 Simian Human immunodeficiency virus complete genome Ed NC_001722 Human immunodeficiency virus 2 complete genome Ed NC_001664 Human herpesvirus 6 complete genome i Figure 2 4 The Search tab of the Document Table This feature provides more search optio
42. mcs Also indude Descendants y Parents Inactively linked documents OK cancel Figure 3 20 Export Dialog 3 10 The Chromatogram viewer The Chromatogram viewer provides a graphical view of a the output of a DNA sequencing machine such as Applied Biosystems 3730 DNA analyzer The raw output of a sequencing machines is known as a trace a graph showing the concentration of each nucleotide against sequence positions The raw trace processed by a Base Calling software which detects peaks in the four traces and assigns the most probable base at more or less even intervals Base calling may also assign a quality measure for each such call typically in terms of the expected probability of making an erroneous call Sequence Logo When checked bases letters are drawn in size proportional to call quality where larger implies better quality or smaller chance of error Note that the scale is logarithmic the largest base represents a one in a million 107 or smaller probability of calling error while half of that represents a probability of only a one in a thousand 1073 Mark calls Draw a vertical line showing the exact location of the call made by the base calling software Layout Options controlling layout and view Those include X and Y axis scaling size of largest base letter when Sequence logo is on and minimum size of base letter to prevent bases of low 92 CHAPTER 3 DOCUMENT VIEWERS quality becoming
43. more than Y times If you set X to be 0 when this operation is complete it will report which candidate enzymes matched 0 times Exclude enzymes cutting between residues lets you annotate only enzymes which do not cut within a certain range e If you select to show More Options a table of all enzymes in your candidate set filtered by the effective recognition sequence length constrained when active will be displayed Only the enzymes selected in this table will be considered in the analysis initially all rows are selected You can click on the column headers to sort the table ascending or descending by that column and you can Shift click and Ctrl click to select a range of rows and to toggle the selection of a row respectively If not all candidate enzymes are currently selected because of a recognition sequence length constraint or because you have selected a subset of the table rows yourself you can save the currently selected enzymes into a separate document by clicking Save Selected Enzymes The document will be created in the current folder in your local database and this set will then beavailable in the Candidate Enzymes option in this and all future analyses until the document is deleted You can choose a custom name for the document such as Lab Fridge or Enzymes in pBlueScript II SK multiple cloning site After configuring your options click OK to start the analysis and annotate the restriction sites on the sequence
44. not activate licenses for users as this will prevent the user from activating the license themselves 12 3 Borrow Floating License This item is only available to users for a floating license administered through a FLEXnet li cense server Borrowing a license allows you to borrow one of the seats of a floating license so you can use it even when disconnected from the network Since this decreases the number of seats available for other users borrowing can only occur with the authorization of the sys tem administrator If your borrowing is approved the system administrator will provide you with a borrow file authorizing the borrow To borrow a license check Borrow in the menu and navigate to this file when prompted by Geneious Borrowed licenses have an expiry date when they will automatically be returned to the server but if you are finished with the license before the expiry date please uncheck Borrow in the menu while connected to the network in which the license server resides so that the license is returned to the server and is available to other users again 12 4 Release License Personal licenses can only be activated on a maximum of three computers simultaneously If you no longer need to have Geneious available on a computer where you have activated it you can release the license so it is available for use on another computer Licenses cannot be released too often so do not do it unnecessarily If you re using
45. of features controlling the use of your Geneious license s 12 1 Activate License This item lets you activate a license or choose to connect to a license server The options are as follows e Use license key If you have purchased a personal license you can enter the details here to activate it Make sure you enter the licensee name exactly as it appears in the email in which you received your activation ID registration key An internet connection is required to activate personal licenses e Use license server If your organization has purchased a floating license administered through a FLEXnet license server this is where you enter the details required to connect to the license Ask your system administrator for the host name and port of the license server e Use Sassafras KeyServer If your organization has purchased a floating license adminis tered through Sassafras KeyServer select this option Your system administrator needs to configure KeyAccess to point to the KeyServer license server 12 2 Install FLEXnet This installs the FLEXnet license manager which is necessary for activating a personal license When you try to activate your license Geneious will tell you if this is necessary Only an admin istrator on your computer can do this but it only needs to be done once from one user account 177 178 CHAPTER 12 LICENSING Once this has been done any non admin user can activate their license on the machine The admin should
46. on DropBox or similar but definitely not the entire local database e Access a Shared Database which will handle the transactions correctly and is the best solution all around to accessing data from multiple sources 15 1 3 Sharing files or the local database It isn t unusual to want to share files with other users Geneious has a simple Jabber client which can do this but users all need to be running Geneious at the same time for the files to be accessed To get around this we have seen examples where users have shared a single local database This is a very bad idea as there is no file locking and users can harm each other s data Permissions on Windows Vista and 7 can also cause unpredictable behaviour such as inability to modify files The solution is for users to have their own local database and to access shared content via a shared Database or to export documents in geneious format to a shared drive for others to access 15 1 4 Lost data This can happen when you have upgraded multiple times since you may have had issues find ing your data so could have ended up loading older databases In these cases data for Geneious 7 0 may actually have been stored in the Geneious 4 8 Data folder for example The trick is to identify which of potentially multiple local database folders your most recent data was in Date stamps on the folder should help in this respect Figure 15 1 In Preferences General tab Figure 15 2 you can brows
47. on the fly Filtering can be used while searching for documents via public databases filtering data as it is being downloaded Type in the appropriate text in the Filter Box and only those documents that match both the original criteria as specified by the search terms and the Filter text will be displayed This is an effective way of filtering within your search results 2 8 Meta Data Meta data allow you to add arbitrary information to any of your local documents and any meta data that you add can be treated as user defined fields for use in sorting searching and filtering your documents Where can I add Meta Data You can add meta data to any of your local documents including molecular sequences phy logenetic trees and journal articles You cannot add meta data to search results from NCBI or 52 CHAPTER 2 RETRIEVING AND STORING DATA EMBL etc until the documents are copied into one of your local folders The Properties View All documents have an Info tab in the document viewer panel which contains a Proper ties tab This is where standard properties of documents such as name and description are displayed along with any meta data To add meta data to your document select the Add a Meta Data button on the toolbar and then choose from the available types Selecting a meta data type will create an empty instance of that type To fill meta data values just start typing into the fields Alignment View Di
48. own sequence a restriction site and or Gateway cloning site Multiple extensions can be added in one go and the preview window in the 5 extension dialog box shows how these extensions will be arranged on the primer The order of 5 additions can be edited by dragging and dropping them in this window Le oe Add 5 Extension Add Restriction Site Gateway Site Ncol 3 G 3 Binding regio 3 Cancel ok Figure 4 9 Adding 5 extensions to a primer Once added the 5 extension is shown in bold on the primer sequence and is not covered by the primer annotation as shown in Figure 4 10 These extensions will not change the binding region of the primer and will be ignored when primer testing is conducted against potential target sequences If the primer is annotated onto a sequence following testing the extension sequence is shown in the list of the annotation s qualifiers If the primer or a PCR product is extracted from this annotation the result will include the extension 4 6 6 Manual primer design It is possible to create PCR primers by adding a primer annotation directly onto a sequence This is especially useful for cloning applications as generally the primers must bind to a speci fied set of bases at the beginning and end of the gene to be cloned To manually add a primer select the region of sequence where you wish the primer to bind and click Add Annotation Make the annotation type primer bind
49. product can be extracted from a sequence that has been annotated with both a forward and a reverse primer 5 extensions con sisting of restriction enzymes or arbitrary sequence may also be added to primer documents In addition Geneious can determine the primer characteristics for a primer sized sequence and convert it into a primer Characteristics can also be determined for any number of primer sized selections made in the Sequence View To use any one of these primer operations simply select the appropriate nucleotide sequences and either select Primers from the Tools menu or right click Ctrl click on Mac OS X on the document s and select Primers A popup menu will appear showing the operations valid for your current selection 4 6 1 Design New Primers Geneious uses Primer3 to design PCR primers The Primer Design dialog allows you to set options for where your PCR primers should sit what size product to return and characteristics such as primer length and melting temperature Task Two tasks are available Design New or Design with Existing Design New designs a pair of forward and reverse primers You can specify if you wish to design with or without a matching probe Design with Existing can design a partner primer to match an existing one for example a reverse primer for a forward or vice versa It also allows you to design a probe to match a pair of primers If any documents were selected w
50. score such as GC content These weights are used when looking at primers whose value for this option falls below and above the optimum respectively The other weights are applied no matter in which direction they vary For details on individual options in the Primer Picking Weights dialog again hover your mouse over the option to see a short description Degenerate Primer Design A degenerate primer contains a mix of bases at one or more sites They are useful when you only have the protein sequence of your gene of interest so want to allow for the degeneracy in the genetic code or when you want to isolate similar genes from a variety of species where the primer binding sites may not be identical You can design degenerate primers in Geneious by using either a sequence containing ambiguous bases or an alignment as the template and checking the Allow degeneracy box The degeneracy value that you specify is the maximum number of primers that any primer sequence is allowed to represent For example a primer which contains the nucleotide character N once and no other ambiguities has a degeneracy of 4 because N represents the four bases A C G and T A primer that contains an N and an R has degeneracy 4 x 2 8 because R represents the two bases A and G Advanced Options In the Advanced panel there are options to add 5 extensions to primers and to specify a mis priming library A 5 extension can be your own sequence a restriction enzyme o
51. select the desired cost matrix for the alignment The available options here will change according to the type of the sequences you wish to align You can also click the Custom File button to use a cost matrix that you have on your computer the format of these is the same as for the program BLAST e Gap open cost and Gap extend cost Enter the desired gap costs for the alignment e Free end gaps Select this option to avoid penalizing gaps at either end of the alignment See details in the Pairwise Alignment section above e Preserve original sequence order Select this option to have the order of the sequences in the table preserved so that the alignment contains the sequences in the same order e Additional options Any additional parameters accepted by the ClustalW command line program can be entered here Refer to the ClustalW manual for a description of the avail able parameters You can also do a clustal alignment via translation and back as with pairwise alignment After entering the desired options click OK and ClustalW will be called to align the selected sequences or alignment Once complete a new alignment document will be generated with the result as detailed previously 4 4 4 Sequence alignment using MUSCLE MUSCLE is public domain multiple alignment software for protein and nucleotide sequences MUSCLE stands for multiple sequence comparison by log expectation See http www drive5 com muscle To perform
52. still with threshold 3 then 70 of the column is now similar so those 7 K s would be colored the lighter grey 60 80 range Alternatively going back to the default threshold value of 1 and with a column consiting of 7 K s 2 R s and 1 Y now since the 7 K s and 2 R s have similarity exceeding the threshold whereas 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 65 the Y is not that similar to K and R the K s and R s will be colored dark grey since they make up 90 of the column Hydrophobicity color scheme This colors amino acids from red through to blue according to their hydrophobicity value where red is the most hydrophobic and blue is the most hydrophilic The values the color scale is based on are given in Figure 3 3 These values are taken from http biochem ncsu edu faculty mattos CrystallographyTutorial AminoAcids htm Amino acid Hydrophobicity Phe F 1 000 Red Leu L 0 943 lle I 0 943 Tyr Y 0 880 Trp WwW 0 878 Val V 0 825 Met M 0 738 Pro P 0 711 Cys C 0 680 Ala A 0 616 Gly G 0 501 Purple Thr si 0 450 Ser S 0 359 Lys K 0 283 Gln Q 0 251 Asn N 0 236 His H 0 165 Glu E 0 043 Asp D 0 028 y Arg R 0 000 Blue Figure 3 3 Hydrophobicity values for amino acids and corresponding color scale 66 CHAPTER 3 DOCUMENT VIEWERS Polarity color scheme This colors amino acids according to their polarity as follows Yellow Non polar G A V L I F W M P Green Polar uncharged
53. text in UTF 8 UTF 16 won t work so check the encoding your text file is saved in There are many good choices Just not Word 200 CHAPTER 15 TROUBLESHOOTING 15 4 5 Exporting data Any export will likely lose some information For an annotated sequence then GenBank format does a decent job of preserving the information in a form that many other programs will handle However if you want to preserve the look of the document then you have to export the data as a graphic using File Save As Image File Probably the most compatible is JPG but this is an image made up of dots so it is important to know this won t scale well The default resolution will also be quite low so you should probably increase the resolution to about 400 to make the image look good when printed For scaleable graphics PDF or EPS are good choices SVG is also scaleable and has the ability to be edited in tools such as Adobe Illustrator expensive or Inkscape free Using SVG will allow users to tweak the graphic and add annotations and still have the image scale nicely because it is a vector graphic 15 5 BLAST issues Geneious allows users to run sequence searches using the NCBI BLAST service or to install a local copy of BLAST to use with their own databases CustomBLAST Here are some issues you ll likely run into 15 5 1 Can t connect to BLAST service This is likely a problem with the proxy configuration Geneious sends BLAST jobs via a URL on port 80 b
54. the file partially downloaded you will need to start downloading it again from the beginning 7 Set Up Search Services Service Custom BLAST v Custom BLAST is not set up Database Location C Documents and Settings matthew Geneious4 S BLAST Let Geneious do the setup click OK to start a Setup Options Custom BLAST Setup Geneious is downloading the required files You may continue to use Geneious Downloaded 1 805 of 11 530 MB 15 65 Approximately 59 seconds remaining b Downloading Figure 5 1 Setting Up Custom BLAST 5 1 3 Adding Databases Now that you have set up the executables it is time to add databases to your BLAST 5 1 SETTING UP 141 Adding FASTA databases 7 Add Sequence Database Service Custom BLAST Database Name My Database Use 1 selected sequences Contents 2 Create from file on disk C MyDatabase fasta v Browse Type Nucleotide v C Do not check file for duplicate names or invalid bases residues better performance Figure 5 2 Adding a FASTA database To create a database from the sequences in a FASTA file go to Tools Add Remove Data bases Add Sequence Database and select Custom BLAST from the Service drop down box Choose to Create from file on disk and then click Browse to navigate to the FASTA file that contains the sequences you want to BLAST Enter a name for the database and click OK
55. the sequence s completely This can be undone in the Sequence View before the sequences are saved 130 CHAPTER 4 ANALYSING DATA e Remove existing trimmed regions from sequences This is only available when there are already trimmed regions on some of the sequences This will remove the existing trimmed regions from the sequences permanently no new trimmed regions are calcu lated e Trim vectors Screens the sequences against UniVec to locate any vector contamination and trim it This uses an implementation similar to NCBI s VecScreen to detect contami nation http www ncbi nlm nih gov projects VecScreen Trim primers Screens the sequences against primers in your local database e Error Probability Limit Available for chromatogram documents which have quality confidence values The ends are trimmed using the modified Mott algorithm based on these quality values Richard Mott personal communication This trims bases up until the point where trimming further bases will only improve the error rate by less than the limit e Maximum low quality bases Specifies the maximum number of low quality bases that can be in the untrimmed region Low Quality is normally defined as confidence of 20 or less This can be adjusted on the Sequencing and Assembly tab of Preferences e Maximum Ambiguities Finds the longest region in the sequence with no more N s than the maximum ambiguous bases value and trims what is not in this region Th
56. to use amp This button appears in the bottom left corner of any options window where profiles can be saved and loaded Click on this button to reset to defaults load a profile save a new profile or manage your existing profiles 4 8 1 Saving a profile To create a profile set the options up the way you want then choose Save Current Settings You can then enter a name for your profile and choose whether it is shared For a description of shared profiles see the section on sharing profiles 4 9 RESULTS OF ANALYSIS 137 When you save a profile it is attached to the particular analysis window that you have open Eg if you save a profile for Alignment it can only be loaded for Alignment not for Assembly 4 8 2 Loading a profile To load a profile choose Load Profile and click on the name of the profile you want to load The settings for the operation will immediately update to reflect the profile Note Sometimes when you load a profile the settings may not exactly match what was saved This is because the available settings can change depending on what type of documents you have selected 4 8 3 Managing profiles Click on Manage Profiles under Load Profile to see a list of profiles with options for delet ing editing importing and exporting profiles See sharing profiles section below for more on import and export 4 8 4 Sharing profiles There are two ways to share option profiles Import and ex
57. to reliably allocate more than 1GB of RAM you need a 64 bit machine If you have a 64 bit machine with a 64 bit OS installed and at least 4GB of RAM you can safely allocate 2GB If you have more RAM then you can allocate more to Geneious It isn t advisable to allocate much more than half the available memory because again you ll starve the operating system of resources Users will often complain that Geneious is using an huge amount of memory because they ve looked at Task Manager on Windows or Activity Monitor on a Mac Linux users may well be more savvy in this respect but the best way to see how much memory Geneious is really using is to use the memory usage bar The JVM itself will use memory and it is the total RAM allocated to the JVM that users will see from the various monitors 15 3 2 Indexer issues Geneious uses the Lucene indexer as the basis of the searching function The indexer has the ability to be paused so if you see the indexer running like mad click the indexing indicator under the Sources panel which shows it is indexing and it will pause Figure 15 7 This may take pressure off the hard drive which can badly affect performance because if you have multiple applications that are thrashing the drive then everything suffers Pausing the indexer can help get those other tasks finished and once they re done the indexer can be restarted If you don t restart the indexer features such as enzyme lists or test with sa
58. unreadable 3 11 The PDF document viewer To view a pdf document either double click on the document in the Documents Table or click on the View Document button This opens the document in an external PDF viewer such as Adobe Acrobat Reader or Preview Mac OS X On Linux you can set an environmental variable named PDFViewer to the name of your external PDF viewer The default viewers on Linux are kpdf and evince 3 12 The Journal Article Viewer This viewer provides two tabs Text View and BibTex Text view displays the journal article details including the abstract The text contains a link to the original article through Google Scholar below the title and authors Figure 3 21 BibTex is the standard IAT X bibliog raphy reference and publication management data format ATEX is a common program used to create formatted documents including this one The information in the BibTex screen can be exported for use in BIFX documents 3 12 THE JOURNAL ARTICLE VIEWER Text View BibTeX Info eg 01 lt Q Estimating mutation parameters population history and genealogy simultaneously from temporally spaced sequence data Drummond AJ Nicholls GK Rodrigo AG amp Solomon W School of Biological Sciences University of Auckland 1001 Auckland New Zealand alexei drummond zoology oxford ac uk Genetics 2002 161 1307 20 Google Scholar Molecular sequences obtained at different sampling t
59. using the Partition Function The Compute Options will rerun RNAfo1d when you change their settings so depending on the size of the sequence there may be a noticeable recompute time Sequence View Dotplot Self Annotations RNA Fold Text View History Notes nae PS 2 138 E r View Options Color By Probability M Show Bases Mm Show Sequence Selection O Flip View Mm Highlight Ends Rotate FA oo Compute Options Mm Partition Function No GU pairs Mm Avoid isolated base pairs Assume circular molecule Dangling Ends Both sides E Energy Model RNA Turner Temperature C 37 B Reset Defaults Figure 3 9 A view of an mRNA secondary structure prediction in Geneious 3 6 3D STRUCTURE VIEWER 81 3 6 3D structure viewer For molecular structure documents such as PDB documents this displays an interactive three dimensional view of the structure Sequence View Text View History Notes Wy Reset Color Style B Atoms f Bonds 6d Effects H Save Hie oO Highlight El select v HEN gt A 1AA gt M2 Eu gt M3 MET gt E 4c gt A 5AA gt O 6GLN gt H7 His gt A 8AA gt Mo LEU gt io Lys gt MM 11 MET gt 12 cu gt A 13 ALA gt Ha His gt 1s Leu Expand All Collapse All To rotate click and drag To zoom hold shift key while dragging To pan hold shift then double click and drag Figure 3 10 A view of a 3D protein structure
60. values CSV file The value displayed in the document table can be exported to csv file which can be loaded by most spread sheet programs When choosing to export in csv format Geneious will also present a list of the available columns in the table including hidden ones so you can choose which to export Data can be exported to TSV tab separated values format too There is also a CSV importer It is often useful to export your data to a spreadsheet to do bulk modifications to fields and then reimport 2 2 6 Batch Export Batch export takes the selected documents and exports each to its own file E g select several chromatograms to export them all to ab1 format files The options for batch export let you specify the format and folder to export to as well as the extension to use Each file will be named according to the Name column in Geneious 2 3 Searching Searching is designed to be as user friendly as possible and the process is the same if you are searching your local documents or a public database such as NCBI To search the selected database or folder click the Search button from the toolbar For non local folders search will be on by default and cannot be closed This applies to NCBI and EMBL databases For local folders search is off by default When search is first activated the document table will be emptied to indicate no results have been found To return to browsing click the Search button again or press the Escape
61. you believe are bad calls to be the base which you believe is the correct call This is often decided by looking at the quality for each of the bases and choosing the higher quality one Geneious can do this automatically for you if you use the Highest Quality consensus 136 CHAPTER 4 ANALYSING DATA Bases in the consensus sequence can also be edited which will update every sequence at the corresponding position to match what is set in the consensus D gt Cr extract GRC BS Translate VL Allow Editing di Add Edit Annotatior gt gt y gq 170 680 690 710 ama Ce FWD 2 Frag AGGAGGAACACCGGT TGGCG AAG CCGG TCTCTGGGAAA TAACTGACGCTGAGG Consensus Sequence li nu Ln Ce REV 3 Frag AGGAGGAACACCGGTGGCGAAGGCGMG TCTCTGGGAAA TAACTGACGCTGAGG Figure 4 16 Highlight disagreements and edit to resolve them 4 7 8 Saving the Consensus Once you are satisfied with a contig you can save the consensus as a new sequence by clicking on the name of the consensus sequence in your contig and clicking the Extract button 4 8 Saving operation settings option profiles Profiles allow you to save the settings for almost any analysis operation in Geneious so they can be loaded later or shared with others Eg the recommended trimming parameters for your organization can be saved as a profile and then shared on the shared Database for everyone
62. you reset the position of the structure reset the appearance of the structure to the default or reset the appearance of the structure to its appearance when it was last saved e Color lets you change the color scheme of the selected region of the atom e Style lets you change the style of the selected region of the molecule eg to spacefill or cartoon view e Atoms lets you hide atoms or change their size in the selected region of the molecule You can also choose whether to show hydrogen atoms and atom symbols e Bonds lets you hide bonds or change their size in the selected region of the molecule Covalent ionic bonds hydrogen bonds and disulfide bonds can be affected separately e Effects lets you toggle spin antialiasing stereo and slabbing effects for the whole molecule e Save saves the current appearance of the molecule 3 7 TREE VIEWER 3 7 Tree viewer The tree viewer provides a graphical view of a phylogenetic tree Figure 3 11 When viewing a tree a number of other view tabs may be available depending on the information at hand The Sequence View tab will be visible if the tree was built from a sequence alignment using Geneious The Text View shows the tree in text format Newick Distances Tree View Alignment View Text View Info
63. 1 2 4 The Help Panel The Help Panel has two sections Tutorial and Help The tutorial gives you hands on experience with some of the most popular features of Geneious The Help section displays a short description of the currently selected service or document viewer This panel can be closed at any time by clicking the button in its top corner or by toggling the Help button in the Toolbar If you are new to Geneious working through the tutorial is a great way to familiarize yourself with Geneious 1 2 USING GENEIOUS FOR THE FIRST TIME eo Return to contents Sequence alignment If the sequences you just imported are not selected select them all now To align all the sequences click on the Align Assemb e button in the toolbar and choose Mu tiple Align Click OX to accept the default alignment settings When the alignment has finished a new alignment document will be added to the current folder and selected The alignment will be displayed in the sequence viewer If the alignment is not displayed make sure the alignment is selected in the Document Table and that the Document Viewer is set to A lignment View There is also an option to perform the alignment using ClustalW or MUSCLE two widely used alignment programs or via a translation alignment where DNA sequences are aligned by their Figure 1 3 The Help Panel 12 CHAPTER 1 GETTING STARTED 1 2 5 The Toolbar The toolba
64. 4 The Toolbar 1 2 6 The Menu Bar The Menu Bar has seven main menus File Edit View Tools Sequence Annotate amp Predict and Help For details on the menu bar see section 2 1 7 1 2 7 Popup Menus Many actions can be quickly accessed for data items services and sometimes selections in a viewer via popup menus also known as context menus To invoke a popup menu for an item simply right click Ctrl click on Mac OS X The popup menu will contain the actions which are relevant to the item you clicked Chapter 2 Retrieving and Storing data Geneious is a one stop shop for handling and managing your bioinformatic data This chapter summarizes the different ways you can use Geneious to acquire update organize and store your data By the end of this chapter you should be able to e Know the purpose of each panel in Geneious e Import Export data from various sources e Organize your data into easily accessible folders e Automatically update your data e Know about the advantages of the Meta Data functionality e Customize Geneious to meet your needs e Export and print images from Geneious e Back up your data 2 1 The main window This section provides more information on each of the panels in Geneious Figure 2 1 13 14 2 1 1 The Sources Panel gt F Back Sources v Local 0 v Sample Documents 0 1 3D Structures 5 Alignments 6 Contig
65. 40 PM Finished 22 Nov 2011 3 26 PM Finished 21 Nov 2011 9 40 AM Finished Paired Reads assem Search DCN gene Progress Running Run Location Geneious Server Local Local Local Geneious Server Local unning Pop Out Select results Select results Download Results Figure 13 4 Operations table showing Geneious Server and local jobs 13 4 GENEIOUS SERVER ENABLED PLUGINS 13 4 Geneious Server enabled plugins 183 This table details plugins which work with Geneious Server Note that some of these plugins only run on Geneious Server so if you try and run them locally you will get a warning that this is the case Plugin Local Server Geneious Alignment Yes Yes MUSCLE Alignment Yes Yes ClustalW Alignment Yes Yes Realign Region Yes Yes Translation Align Yes Yes MAFFT Alignment Yes Yes Consensus Align Yes Yes Profile Align Yes Yes Mauve Genome Yes Yes LASTZ Alignment No Yes Geneious Tree Builder Yes Yes Consensus Tree Builder Yes Yes MrBayes Yes Yes PHYML Yes Yes PAUP Yes Yes Geneious Assembler Mapper Yes Yes Bowtie short read mapper No Yes BWA short read mapper No Yes Maq short read mapper No Yes SOAP2 short read mapper No Yes Tophat RNAseq aligner No Yes Velvet short read assembler No Yes Find Variations SNPs with SAMtools No Yes CustomBLAST Yes Yes 184 CHAPTER 13 GENEIOUS SERVER
66. 5 Max memory On Windows and Linux edit the vmoptions current defaults file in the installation di rectory and change the Xmx value to your preferred setting On Mac OS X edit the Applications Geneious app Contents Info plist file and find the VMOpt ions section and modify the Xmx setting It is important on Mac OS X to ensure that this value is set appropriately after an upgrade because users can often find that they have many large files in their local database preventing Geneious from starting if this value is reset to the normal default 700M on 32 bit 1000M on 64 bit This is an issue because the Info plist file is stored in the Geneious app bundle so it gets replaced when upgrading 188 CHAPTER 14 ADMINISTRATION Chapter 15 Troubleshooting 15 1 Local database issues This section will help you deal with typical issues with the local database 15 1 1 The local database Geneious stores the user s data in a folder called Geneious 7 0 Data which will be located in the user s home directory by default When you upgrade Geneious offers to create a copy of this folder with the upgrade s version number in the name and update the format Geneious databases are not backwards compatible so if you upgraded and haven t accepted the offer to keep a backup you will not be able to downgrade If you downgrade to an earlier version you won t be able to see documents you created in the newer version 15 1 2 Storing the database
67. 50 000 251 000 252 000 253 366 254 000 ChrM alll v Microsatellites o GQ 00090 y KnownGenes lt EDS lt c o Com on ea exon 4 v microRNA ATMG00930 1 4 ATMG00950 1 4 e i ee ae o_O D gt Prediced ORF q HE 100 0 e Figure 3 2 The minimap and sequence view of a chromosome with gene and variation anno tations under the genome viewer configuration The genome viewer provides the genome viewer selection controls allowing for the efficient navigation of large sequences These controls grant the ability to select individual sequences 64 CHAPTER 3 DOCUMENT VIEWERS from the sequence list as well as an extended set of zoom controls The amp Go to Postion button allows for the instant navigation to a particular nucleotide coordinate for any sequence in the current document selection using UCSC genome browser notation Additionally the genome viewing configuration will display a minimap representing the cur rently selected sequence and it s underlying annotations The minimap will always show a representation of the entire sequence visible in the sequence viewer The portion of the se quence currently visible in the viewing window highlighted on the minimap showing the relative position of the visible section to the overall sequence The minimap can also be used to quickly navigate around the visible sequence Clicking on a section of the minimap will jump the sequence viewer to center on that po
68. 8 CHAPTER 7 PFAM 3 Pfam B 59 MB contains records for the automatically generated domains in Pfam B taken from PRODOM 4 Pfam C 69 KB contains records for Pfam clans families of similar domains 5 swisspfam 132 MB contains data on the domain architecture of UniProt sequences 7 1 1 Downloading the Pfam databases yourself If you want you can download or otherwise acquire the Pfam databases outside of Geneious You will need to let Geneious know where to look for the files once you have done this To do this select the Pfam service Click the Change Database Location and browse to the location of the databases 7 12 Downloading the Pfam databases through Geneious Geneious provides a download manager to help you download the Pfam files To use it select the Pfam Service Click the Let Geneious do it button Then click the Start button After a few seconds the first database will start downloading You can click Pause to pause the download You can search a database as soon as it has finished downloading and its contents have been verified If you shut down Geneious with a file partially downloaded you will need to start downloading it again from the beginning The Pfam databases total around 4 GB in size most of which comes from Pfam A full If your internet connection is slow or you have a low data cap you may want to download the databases elsewhere and then transfer them to your computer You may also
69. 92 3 12 The Journal Article Viewer 24 64 4 4 64 Bove Backs ee ee 92 4 Analysing Data 95 A Cerar ce pecto Hae eee aly ee ae ed A a wae ed a rl ed 95 42 equencedala gt o ce epode Ow Re SH ER Oe SG Se Oe ew Re ES 95 do DIO 6 oe ie ow eRe he REE epe See ERSEE GSH EES 96 4d Sequence Ae NE See HE Feed bee EEE a 97 4 5 Building Phylogenetic trees 22 occa eee nsdw een ev ee ew ee de 104 de PU Primers oor 24 bbe debe ee REE a ee bese oe BESS 110 A7 Te II 123 4 8 Saving operation settings option profiles o o o 136 49 Results Of GNANGIS 34 bce es AE A RR A A ARA 137 5 Custom BLAST 139 51I Seng A EI 139 6 COGs BLAST 143 Ol DOM UD cdi tada AAA e 288 143 A e III te Pee Paks 144 7 Pfam 147 CONTENTS TL Seting UR ME aes be a soss A EA A 7a PI Document Types ia AAA Be ha A BS 7 3 Pfam Operations 8 Geneious Education DL Creating atutoral sc c oe sse EEK RA A ARA BE Answernga tutorial o mesae sera DETER AAA AA a RODS 9 Collaboration 91 Managing Your Accounis lt lt e cec ia AAA ew eh o A 9 2 Managing Yo r CONOS esa ee REE aR ee SEEMS a HRS TON e fs aisi aterata Pe pi epen ee ae ES 9 4 Browsing Searching and Viewing Shared Documents 25 Chabe eo ee oe ewe eek ardid da 10 Cloning 10 1 Find Restriction Sites lt a aoa ee a a se ee a 10 2 Digest o e e e aw ses e a Se S a ee Oe TEDA E 10 3 Insert into Vector 10 4 Gateway MR po oes a Oe eee eee ee ee eee eee Fae eS
70. A03 2 ab1 B05 1 ab1 B05 2 ab1 etc where A03 and B05 are the identifiers you would choose Assemble by 1st part of name separated by full stop Assembly method Specifies a trade off between the time it takes to assemble and the accuracy of the assembly Higher sensitivity is likely to result in more reads being assem bled Trim Sequences Select how to trim the ends of the sequences being assembled See section 4 7 3 e Save assembly report Instead of displaying the results of the assembly in a dialog the results are saved in a separate report document alongside the contig s This lists which fragments were successfully assembled and which contig they went in to along with a list of unassembled fragments Advanced Options Click More Options e Save results in a new subfolder named If selected all results of the assembly will be saved to anew subfolder inside the one containing the fragments This folder will always only contain the assembly results from the one most recent assembly it creates a new folder each time it is run e Alignment Options Penalties and scores used when aligning the fragments these nor mally don t need to be changed Other advanced options depend on the assembly method selected These are fully documented if you hover the mouse over them in Geneious High Sensitivity slowest assembly advanced options include Minimum Overlap The minimum overlap in nucleotides betwe
71. AM sam bam Contigs SAMtools Sequence Chromatograms abl scf Raw sequencing trace amp sequence Sequencing machines VCF VCF Annotations 1000 Genomes Project Vector NTI sequence gb gp Nucleotide amp protein sequences Vector NTI Vector NTI AlignX alignment apr Alignments Vector NTI AlignX Vector NTI Archive ma4 pa4 oa4 Nucleotide amp protein sequences ea4 ca6 enzyme sets and publications Vector NTI Vector NTI ContigExpress cep Nucleotide sequence assemblies Vector NTI Vector NTI database VNTI Database Nucleotide amp protein sequences enzyme sets and publications Vector NTI BED format The BED format contains sequence annotation information You can use a BED file to anno tate existing sequences in your local database import entirely new sequences or import the annotations onto blank sequences CLUSTAL format The Clustal format is used by ClustalW 24 and ClustalX 23 two well known multiple se quence alignment programs Clustal format files are used to store multiple sequence alignments and contain the word clustal at the beginning An example Clustal file CLUSTAL W 1 74 multiple sequence alignment 2 2 IMPORTING AND EXPORTING DATA 25 seql KSKERYKDENGGNYFOLREDWWDANRETVWKAITCNA seq2 YEGLTTANGXKEY YODKNGGNFFKLREDWWTANRETVWKAITCGA seq3 KRIYKKIFKEIHSGLSTKNGVKDRYOQ
72. ANALYSING DATA systems have been developed in this way These matrices incorporate the evolutionary prefer ences for certain substitutions over other kinds of substitutions in the form of log odd scores Popular matrices used for protein alignments are BLOSUM 10 and PAM 2 matrices Note The BLOSUM and PAM matrices are substitution matrices The number of a BLOSUM matrix indicates the threshold similarity between the sequences originally used to create the matrix BLOSUM matrices with higher numbers are more suitable for aligning closely related sequences For PAM the lower numbered tables are for closely related sequences and higher numbered PAMs are for more distant groups When aligning protein sequences in Geneious a number of BLOSUM and PAM matrices are available Algorithms for pairwise alignments Once a scoring system has been chosen we need an algorithm to find the optimal alignment of two sequences This is done by inserting gaps in order to maximize the alignment score If the sequences are related along their entire sequence a global alignment is appropriate However if the relatedness of the sequences is unknown or they are expected to share only small regions of similarity such as a common domain then a local alignment is more appropriate An efficient algorithm for global alignment was described by Needleman and Wunsch 16 and their algorithms was later extended by Gotoh to model gaps more accurately 6 For loca
73. Assembly 5 Genomes 4 Linnaeus Blast 1 ents Sources asg B 27 Panel s 5 Restriction Enzymes 2 P Tree Documents 4 B Deleted Items 0 B Searches 0 Shared Databases EB Operations Collaboration Y Z NCBI B Gene B Genome B Nucleotide 2 PopSet amp Protein PubMed B snp Structure F Taxonomy wv P Pfam Not set up E Domains E UniProt j 1 Using 77 1044 MB memory Forward Sequence Search Agents 1of6 selected M Name 4 Description FR COXIICDS Multiple alignment of 51 Cytochrome C Pairwise protein t of peptidase from kiv M People Document kos sequences from PFam B_7 domai Three Kingdoms Table t of Alanyl tRNA synthe Transcript variants Multiple alignment of 4 variants of MAPK 29 CHAPTER 2 RETRIEVING AND STORING DATA 7 sr AO OM a Align Assemble Tree Primers Cloning Back Up Support Help Hide ISA Distances Text View Info Extract GRC 6 Translate gt NEO Identity Ce 1 Adam T Ce 2 Harry Ce 3 Sally De 4 Bob Ce 5 Jane Alt click on a sequence position or annotation or select a region to zoom i Document View dg Search O Help Alignment View Help The sequence view is a highly customizable viewer for protein and eotide sequence seg ar seq Help tre Panel Zool g The sequence view lets you zoom in to view individual residues or zoom out to view an entire sequen
74. C3 of Terr repressiple GFP generator inserted into pSB1C3 PCR Produc a p o 2 Go to Activate link to child reruns operation Ter Activates the link between this document and its child Changes to the parent will be propagated to the child TT Figure 3 19 Context Menu Note reactivating links immediately reruns the operation depending on the size and type of 3 10 THE CHROMATOGRAM VIEWER 91 the operation this can be time consuming Also note that reactivating will cause any unsaved changes to any direct or indirect descendants to be overwritten since this involves a complete recompute from the parent documents You will be warned about this before Geneious allows you to reactivate Finally you may export the currently selected document highlighted in blue in the view di rectly via the export button Doing so will bring up a dialog Figure 3 20 From here you can choose to export parents or descendants only or both as well as choose to export only those documents that are actively linked in the hierarchy Similarly to how unchecking the Show In active Links checkbox works unchecking Inactively linked documents here will mean that the export will stop as soon as it finds in inactively linked parent or descendant depending on the relevant direction and stop exporting down that branch of the lineage Export gt Exporting Human proinsulin modified inserted into Cloning vector pEF F3 EGFP F3
75. Constraints Date Collected Date _ Constraints Text Constraints True False Whole Number Decimal Date Drop down list Create delete _ Cancel ok Figure 2 16 The Edit Meta Data Types window Constraints These are limiting factors on the data and are specific to each field type For ex ample numbers have numerical constraints is greater than is less than is greater or equal to and is less or equal to These can be changed to suit The constraints for each field can be viewed by clicking the View Constraints button next to the field This will show a pop up menu with the constraints you have chosen see figure 2 17 Using Meta Data The main purpose of meta data is to add user defined information to Geneious documents However meta data can be searched for and filtered as well Also documents can be sorted according to meta data values Searching Once meta data is added to a document it is automatically added to the standard search fields These are listed under the Advanced Search options in the Document Table From then on you can use them to search your Local Documents If you have more than one Field in a meta data type they will all appear as searchable fields in the search criteria Filtering Meta data values can be used to filter the documents being viewed To do so type a value into the Filter Box in the right hand side of the Toolbar Only matching documents will
76. DNA Probe is being designed or tested These two sections are quite similar the DNA probe section has a subset of the options available in the primer section This is because primers are usually chosen in pairs and so several options can be set for how pairs are chosen y Characteristics DNA Probe Size Min 18 Optimal 20 Max 27 Tm Min 57 Optimal 60 gt Max 6311 GC Min 20 Optimal 50 Max 80115 Product Tm Min 0 Optimal 0 gt Max ojis Max Tm Difference 100 GC Clamp ojis Max Dimer Tm 47115 Max Poly X sil Max 3 Stability 911 Allow primers inside target with penalty Primer Picking Weights Allow Degeneracy Figure 4 6 Primer characteristics options Primer Picking Weights At the bottom of the Characteristics panel there is a Primer Picking Weights button Clicking this brings up a second dialog containing many more options The purpose of all of these options is to allow you to assign penalty weights to each of the parameters you can set in the options The weight specified here determines how much of a penalty primers and probes get when they do not match the optimal options The higher the value the less likely a primer or probe will be chosen if it does not meet the optimal value 114 CHAPTER 4 ANALYSING DATA Some of the weights allow you to specify a Less Than and Greater Than This is for options which allow you to specify an optimum
77. G 8 0 5 4 nucleotides 1 per match TGCAGCA 6 0 blunt 1 per match ACCTGC 4 8 6 0 5 4 nucleotides 1 per match G GTACC 6 0 5 4 nucleotides 1 per match G GYRCC 5 0 5 4 nucleotides 1 per match CCANNNN NTGG 6 0 3 3 nucleotides 1 per match CCGCTC 3 3 6 0 blunt 1 per match GT MKAC 5 0 5 2 nucleotides 1 per match CG CG 4 0 blunt 1 per match T CCGGA 6 0 5 4 nucleotides 1 per match CCGC 3J 1 4 0 2 nucleotides 1 per match AASCGTT 6 0 2 nucleotides 1 per match GGATC 4 5 5 0 1 nucleotide 1 per match Y GGCCR 5 0 4 nucleotides 1 per match R AATTY 5 0 4 nucleotides 1 per match CTGAAG 16 14 6 0 3 2 nucleotides 1 per match CAC GTG 6 0 blunt 1 per match GR CGYC 5 0 5 2 nucleotides 1 per match CACNNN GTG 6 0 3 3 nucleotides 1 per match GT AC 4 0 blunt 1 per match C Only consider enzymes with palindromic recognition sequence Restriction enzyme information was obtained from rebase a free database Save Selected Enzymes Fewer Options Restore Defaults Figure 10 1 Find Restriction Sites options dialog with extended options showing 10 3 INSERT INTO VECTOR 165 Digest into fragments Digest using O Annotated cut positions Enzyme set enzymes from lab fridge 6 Minimum effective recognition sequence length 3 nucleotides to times 6 enzymes selected Recognition Sequ Effective Length Overhang GCGAT CGC 8 5 2 nucleot
78. Hide bases and residues Hides the residues bases of the sequence and just leaves the an notations visible e Show Name Show or hide sequence and graph names inside the sequence viewer panel e Show residue numbers This toggles the display of the residue position number above the sequence residues e Show original base numbers This toggles the display of the residue position numbers for the original sequence on a per sequence basis It is only available for alignment documents and sequences that were extracted from other sequences 74 CHAPTER 3 DOCUMENT VIEWERS e Outline residues when zoomed out This adds a fine line around the sequence which can help with clarity and printing You can also adjust the appearance of annotations Labels This option changes how labels are displayed Inside Outside Inside or Outside and None Overlay on bases when zoomed out When only a single annotation covers a region it will be placed on top of the sequence e Compress annotations This option reduces the vertical height of the annotations on dis play This reduces the space occupied by annotations by allowing them to overlap and increases the amount of the sequence displayed on the screen e Hide excessive labels This will reduce screen clutter by removing annotation labels which are too frequent Finally you can control the size of fonts for bases labels names and numbering 3 2 12 Statistics This di
79. M Exon 164 o y lt gt M Five_prime_UTR 4 mu lt gt M mRNA 122 _ _ y lt gt C microRNA 24 Options lt gt 7 D Fa Y rRNA 3 gt lt gt Figure 3 7 The annotations options in the sequence viewer 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 73 3 2 9 Live Annotate amp Predict pa This section contains any real time annotation generators such as Find ORFs and Predict Pro tein Secondary Structure Others may be available if plugins are installed To use one of these turn on the check box at the top of the generator you want to use and annotations will immediately be added to the sequence You can then change settings for the generator and the annotations will change on the sequence in real time as you do You don t need to click the Apply button unless you want to save the annotations to the sequence per manently 3 2 10 Restriction Analysis C This behaves similarly to the Live Annotate amp Predict section above Please refer to the 10 chapter for full details 3 2 11 Advanced Has various options controlling the look of the sequence e Wrap sequence This wraps the sequences in the viewing area e Linear view on circular sequences This forces circular sequences to be shown linearly e Spaces every 10 residues If you are zoomed in far enough to be able to see individual residues then an extra white space can be seen every 10 residues when this option is selected e
80. Mauve genome alignment viewer Sequence View Notes L l Home ShiftLeft gt Shift Right Zoomin amp ZoomOut amp Zoom Mode DCJ Analysis lt 4 GRIMM Analysis e Find Feature a LCB weight OS 169 Use this slider to change the minimum weight for Locally Collinear Blocks 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500001 a gt Shigella ri 2a str BO1 0000 00 15 2500000 B 0 350 4500001 A R Y Shigella flexn fi 2a str 5 15 0 000000 35000 0 sodot A R v Shigella flexneri 5 str 8401 Figure 15 8 Alignment of genomes with Mauve Another operation users try to do which can be very slow is to try and align many primers against a set of sequences The right tool is Test with Saved Primers but this can also be really slow if they have high levels of degeneracy and lots of sequences The section on primers will offer potential solutions 15 4 Importing and exporting data Getting data into Geneious from other programs and out for publication or use with other programs is generally easy but there are a few frequent issues 15 4 IMPORTING AND EXPORTING DATA 199 15 4 1 FASTA file format FASTA is simple and ubiquitous It is also confusing to users and misused The structure of a FASTA file is like this gt Name Description ATGTCGATGCAT Users oft
81. N DGDNYFQLREDWWTANRSTVWKALTCSD seq4 SORHYKD DGGNYFOLREDWWTANRHTVWEAITCSA seg NVAALKTRYEK DGONFYOLREDWWTANRATIWEAITCSA Sego FSKNIX OQIEELODEWLLEARYKD TDNYYELREAHWWTENRHTVWEALTCEA seq7 KELWEALTCSR seql GGGKYERNTCDG GONPTETONNCRCIG ATVPTYFDYVPOYLRWSDE seq2 P GDASYFHATCDSGDGRGGAQAPHKCRODG ANVVPTYFDYVPOFLRWPEE seq3 KLSNASYFRATC SDGOSGAQANNYCRONGDKPDDDKP NTDPPTYFDYVPOYLRWSEE seq4 DKGNA YFRRTCNSADGKSOSQARNOCRC KDENGKN ADOVPTYFDYVPOYLRWSEE seq DKGNA YFRATCNSADGKSOSQARNOCRC KDENGXN ADOVPTYFDYVPOYLRWSEE seg6 P GNAQYFRNACS EGKTATKGKCRCISGDD PTYFDYVPOYLRWSEE seg P KGANYFVYKLD RPKFSSDRCGHNYNGDP LINLDYVPOYLRWSDE CSFASTA format ABI csfasta files represent the color calls generated by the SOLiD sequencing system DNAStar files DNAStar seq and pro files are used in Lasergene a sequence analysis tool produced by DNAStar DNA Strider Sequence files generated by the Mac program DNA Strider containing one Nucleotide or Pro tein sequence EMBL UniProt Nucleotide sequences from the EMBL Nucleotide Sequence Database and protein sequences from UniProt the Universal Protein Resource 26 CHAPTER 2 RETRIEVING AND STORING DATA EndNote 8 0 XML format EndNote is a popular reference and bibliography manager EndNote lets you search for journal articles online import citations perform searches on your
82. Once you have an account and are connected you can start adding contacts You will not be able to add contacts while an account is disconnected Also you will not be able to see a contact s online status until that contact has approved your request to do so 9 2 1 Add Contact Select your account in the Services Panel and choose Add Contact from the Collaboration submenu or right click Ctrl click on Mac OS X on your account in the Services Panel and choose the same option You will see a simple dialog with one field Jabber ID A Jabber ID looks like an email address and has a similar function It uniquely identifies some other Geneious users account You can enter a contact s Jabber ID directly into this field if you know it To see your own Jabber ID hover your mouse over your account in the Services Panel and it will appear in a tool tip 7 Add Contact to myaccount Jabber ID e g user name talk geneious com Y Search For Contact Figure 9 3 Add Contact dialog box If the server supports it you should also see a Search For Contact link Click this to go to the next dialog 9 2 MANAGING YOUR CONTACTS 157 Here you will see a box for a search string and some checkboxes indicating what you are searching on Enter all or part of the name or email of the contact you want and click the Search button If any rows are returned in the results table you will be able to select one or entries and add t
83. Search results will be lost when you exit Geneious unless the downloaded docu ments have been copied or moved to one of your local folders In Geneious you can create new folders rename existing folders delete and export folders All these choices are available by either right clicking on the folder clicking on the action menu Mac OS X or by holding down the Ctrl button and clicking Mac OS X Also in Mac OS X you can also use the plus and minus buttons located at the bottom of the service panel to create and delete folders 2 5 1 Transferring data It is quick and easy to transfer data to your local folders from either a Geneious database search or from your computer s hard drive Please check you have already set up your destination folders before continuing 42 CHAPTER 2 RETRIEVING AND STORING DATA Q F gt gt 3 A Cc O Q day Search Back Forward Sequence Search Agents Align Assemble Tree Primers Cloning BackUp Support Help 1 of 100 selected Hit Table Query Centric View Annotations Distances Info Vi EValue A Name Description Organism Sequence Le Accession Pairwise ld 3 0 oO L47291 Pan troglodytes lymphocyte antigen PATR A11 Pan troglodytes 1 098 99 1 0 J AK313119 Homo sapiens cDNA FLJ93604 highly similar t Homo sapiens 1 098 97 7 0 U50574 Human MHC class HLA A allele HLA A mRNA Human MHC 1 098 98 3 0 J AF380295 P
84. T TM Hairpin Max Self Complementarity Any PRIMER_ LEFT RIGHT INTERNAL_OLIGO SELF_ANY Primer Dimer Max 3 Self Complementarity PRIMER LEFT RIGHTINTERNAL_OLIGO _SELF_END Monovalent Salt Concentration Concentration of monovalent cations PRIMER SALT_CONC Divalent Salt Concentration Concentration of divalent cations PRIMER_DIVALENT_CONC DNTP Concentration Concentration of dNTPs PRIMER DNTP_CONC Sequence Seq PRIMER_ LEFT RIGHT SEQUENCE Product Size Product Size Ranges PRIMER PRODUCT SIZE Pair Hairpin PAIR ANY COMPL PRIMER _ PAIR COMPL_ANY Pair Primer Dimer PAIR 3 COMPL PRIMER _PAIR COMPL_END Pair Tm Diff Max Tm Difference PRIMER _PRODUCT_TM_OLIGO_TM_DIFF Table 4 1 Geneious primer characteristics and their Primer3 counterparts e Which of Forward Primer Reverse Primer Primer Pair and or DNA Probe could not be found in the sequence e For each of these specific reasons for rejection are listed eg Tm too high or Unaccept able product size along with a percentage which expresses how many of the candidate primers or probes were rejected for this reason After examining the details you can choose take no action or continue and annotate the primer and or DNA probes on the sequences which were successfully designed for 4 6 2 Primer Database The Primer Database consists of all the oligonucleotide documents that exist in your Local or Sh
85. TIG ASSEMBLY 123 boxes allow you to specify which column contains which piece of data often one or more of these won t be applicable and can be left as None Note that at minimum you must specify a Sequence field Lastly any additional data in the form of meta data Clicking the dropdown box next to Meta Data at the bottom of the dialog will allow you to import values to meta data and clicking the or will allow you to insert or remove additional meta data types Next click the Fields button to bring up a dialog An additional set of dropdown boxes will allow you to specify again which columns of data contain the fields which comprise this meta data type This includes custom meta data types that you have created and saved in the past When you re ready hit OK to begin importing When Geneious is done you may be pre sented with the option of grouping the sequences you imported into a sequence list This option is recommended if you re importing very large sets of sequences 4 6 8 More Information The Primer feature in Geneious is based on the program Primer3 http bioinfo ut ee primer3 Copyright c 1996 1997 1998 1999 2000 2001 2004 Whitehead Institute for Biomedical Research All rights reserved If you use the primer design feature of Geneious for publication we request that you cite primer3 as Steve Rozen and Helen J Skaletsky 2000 Primer3 on the WWW for general users and
86. To copy a value right click Ctrl click on Mac OS X on it and choose the Copy name option where name is the column name Sorting All columns can be alphabetically numerically or chronologically sorted depending on the data type To sort by a given column click on its header If you have different types of documents in the same folder click on the Icon column to sort then according to their type Managing Columns You can reorder the columns to suit Click on the column header and drag it to the desired horizontal position You can also choose which columns you want to be visible by right clicking Ctrl click on Mac OS X on any column header or by clicking the small header button in the top right corner of the table This gives a popup menu with a list of all the available columns Clicking on a column will show hide it Your preference is remembered so if you hide a column it will remain hidden in all areas of the program until you show it again As well as items to show hide any of the available columns there are a few more options in this popup menu to help you manage columns in Geneious e Lock Columns locks the state of the columns in the current table so that Geneious will never modify the way the columns are set up You can still change the columns your self however e Save Current State allows you to save the the current state if the columns so you can easily apply it to other tables You can give the state a name
87. a e The approximate p value method calculates the p value by first averaging the qualities of each base equal to the proposed SNP and averaging the qualities of each base not equal to the proposed SNP e Example Assume you have a column where the reference sequence is an A and there are 3 reads covering that position 1 read contains an A in the column and the other 2 reads contain a G All 3 reads have quality 20 99 confidence at this position We want to calculate the p value for calling a G SNP in this column Since the quality values are all equal the p value is the probability of seeing at least 2 Gs if there isn t really a variant here which is equal to C2 x 0 01 0 99 3 C3 x 0 013 False SNPs due to strand bias when sequencing errors tend to occur only on reads in a sin gle direction can be eliminated by specifying a value for the Minimum Strand Bias P value setting A Strand Bias P Value property is added to each SNP to indicate the probability of seeing a strand bias at least this extreme assuming that there is no strand bias SNPs with a smaller strand bias p value will be excluded from the results when using this setting 134 CHAPTER 4 ANALYSING DATA For full details of how the various settings work in the Variation SNP finder hover the mouse over them in Geneious to read the tooltips or click one of the buttons The output of the Variant SNP finder includes the following fields e Coverage
88. ace as the user s data you can safely uninstall Geneious and your data will be untouched When upgrading it is cleaner to uninstall the previous version before installing the new version While upgrading over the top usually works there have been issues due to permissions that have prevented it so uninstalling is needed to work around these 15 2 Network issues 15 2 1 Connection error when trying to search using NCBI or EMBL If the message reads Check your connection settings there is a problem with your Internet connection Make sure you are still connected to the Internet Both Dial up and Broadband can disconnect If you are connected then the error message indicates you are behind a proxy server and Geneious has been unable to detect you proxy settings automatically You can fix this problem 1 Check the browser you are using These instructions are for Explorer Safari and Firefox Open your default browser Use the steps in Figure 15 4 for each browser to find the connection settings gt WO N Now go into Geneious and select Preferences There are two ways to do this e Shortcut keys Ctrl Shift P Windows Linux 38 Shift P Mac OS X e Tools Menu Preferences 5 This opens the Preferences Click on the General tab There are five options in the drop down options under Connection settings Figure 15 5 e Use direct connection Use this setting when no proxy settings are required
89. ach match You can read more about the E value in subsection 2 4 4 For the search to be successful you need to specify a minimum of 11 nucleotides and 3 amino acids Note that search times depend on the number and size of your sequence documents and so may take a long time to complete 2 5 5 Checking and changing the location of your Local folders To check where your Local folders are being stored on your hard drive open the Tools menu in the Menu Bar Click Tools Preferences General Your documents are stored at the location specified by the Data Storage Location field see Figure 2 12 You can change this location by clicking the Browse button and selecting a new location Geneious will remember this new location when you exit Warning Do not place your local database on a network share or use a synchronization tool such as DropBox Geneious accesses the local database frequently so performance will be very poor and your data will get corrupted 2 6 AGENTS 47 2 5 6 Find Duplicates Find Duplicates is located under the Edit menu and is used to identify sequences and other documents that are duplicated It can check for duplicates within a selected set of documents all documents in a folder or in the sequences of a single alignment or sequence list Duplicates can be identified by database ID e g accession or by the residues bases Once run the operation will select all but one c
90. ah blah Geneious User Figure 9 5 Rename Contact dialog box 9 3 Sharing Documents Select one of your local folders Select Share Folder from the File menu Alternatively right click Ctrl click on Mac OS X on a local folder and select the same option e If you share a folder all documents in that folder are shared e If you share a folder all sub folders of that folder are shared e If you share a folder it is available to all your contacts In the future Geneious may support per account options for sharing your documents or even organize contacts into groups so that you can share your documents with specific groups only 9 4 Browsing Searching and Viewing Shared Documents Folders that your contacts have shared will appear beneath that contact just as they do in your contact s own Services panel You can browse these folders as you do your local folders You can also search a shared folder just as you can a local one Additionally you can search all of a contact s shared documents by clicking on the contact itself and then conducting the search You can also search all the shared documents of all of an account s contacts by clicking on the account and conducting the search Agents can be set up on shared folders contacts and accounts You cannot search browse or run or set up agents on a contact that is currently offline When you first view your contact s documents in the Document Table the documents yo
91. ains Find in Document Find Next and Find Previous options Find can be used to find text or numbers in a selected document This is useful when looking for annotated regions or a stretch of bases in a sequence This opens a Find Dialog The shortcut to this is Ctrl F Next finds the next match for the text specified in the Find dialog The shortcut keys are F3 or Ctrl G Geneious then allows you to choose another document and continue searching for the same search word Prev finds the previous match The shortcut keys for this are Ctrl Shift G or Shift F3 There are also the useful Find Duplicates and Batch Rename features in this menu 2 1 THE MAIN WINDOW 19 View Menu This contains several options and commands for changing the way you view data in Geneious Back Forwards and History allow you to return to documents you had selected previously Search is discussed in section 2 3 Agents are discussed in section 2 6 Next unread document selects the next document in the current folder which is unread Table Columns contains the same functionality as the popup menu for the document table header See section 2 1 2 for more details Open document in new window Opens a new window with a view of the currently selected document s Expand document view expands the document viewer panel in the main window out to fill the entire main window Sel
92. ally intensive Geneious is able to run NCBI BLAST on many different databases Some of these databases are non redundant in order to reduce duplicate hits The databases that can be searched are shown in the following tables You can quickly and easily BLAST against any of these databases using any of the available BLAST programs via the Sequence Search operation This operation can be accessed by going to the Tools menu or by right clicking Ctrl click on Mac OS X on a sequence document and choosing Sequence Search This will bring up the sequence search options Geneious gives you the option of searching against a database using either your currently se lected sequence documents or a sequence you enter manually If you choose to enter your sequence manually then Geneious will display a large text box in which you can enter your query sequence as either unformatted text or FASTA format Select your database using the first drop down box Databases are grouped together under their respective services The available programs in the second drop down box will depend on the database you have chosen Geneious also allows you to specify most of the advanced options that are available in BLAST To access the advanced options click the More Options button which is in the bottom left 2 4 PUBLIC DATABASES 39 Table 2 2 Nucleotide sequence searches in the BLAST databases Database Nucleotide searches nr All non redundant GenBank
93. an alignment using MUSCLE select the sequences or alignment you wish to align and select the Align Assemble button from the Toolbar and choose Multiple Alignment At the top of the alignment options window there are buttons allowing you to select the type of 104 CHAPTER 4 ANALYSING DATA alignment you wish to do Choose MUSCLE here and the options available for a MUSCLE alignment will be displayed For more information on muscle and its options please refer to the original documentation for the program http www drive5 com muscle muscle html 4 4 5 Combining alignments and adding sequences to alignments Consensus Alignment allows you to align two or more alignments together and create a single alignment and align a new sequence in to an existing alignment Select the sequences or alignment you wish to align and select the Align Assemble button from the Toolbar and choose Multiple Alignment Consensus alignment allows you to choose which alignment algorithm to use for aligning the consensus sequences All of the pairwise and multiple align ment algorithms are available The consensus sequence used for each alignment is a 100 consensus with gaps ignored 4 5 Building Phylogenetic trees Geneious provides some basic phylogenetic tree reconstruction algorithms for a preliminary in vestigation of relationships between newly acquired sequences For more sophisticated meth ods of phylogenetic reconstruc
94. an troglodytes clone MEAL 1 MHC class I antig Pan troglodytes 1 231 93 3 vw Y BCO19236 Homo sapiens major histocompatibility complex Homo sapiens 95 9 0 DQS39673 Pan troglodytes MHC class antigen Patr A m Pan troglodytes 0 J AF165355 Pan troglodytes MHC class antigen Patr A m Pan troglodytes Hide Selected sequences are only summaries Download Full Sequence s Annotations Dotplot Dotplot Self DNA Fold Distances Text View Download Info BD Cr Extract rc Translate di Add Edit Annotatior Allow Editing Annotate amp Predict Save Eg 181 0 2 3x A os 1 200 400 600 800 1 000 1 270 Graphs a Consensus M Show Graphs Identity Ti imi is al _ Protein Coding Prediction solis 4 alpha 1 exon 2 alpha 2 exon 3 alpha 3 exon 4 tra MHC class alpha chain peptide Window Size 200 i cee MHC class alpha chain CDS o ooo 3 c Ce 1 pygmy c Mitt D A LA LI LES ed ay E oor Step Size ___ RR ____ o nRR Qoooomzooanponpr nVmo 2 BC019236 mirra MT AT ro pa a ma Based on he EMBOSS pol toda x _ GC Content 100 lt 4 Alt click on a sequence position or annotation or select a region to zoom in Alt shift click to zoom out Figure 2 9 Sequence Search Complete Moving documents from Geneious searches to your Local folders There are a number of ways to do this Drag and drop This is quickest and easiest Select the documents that you want to move Then while holding the mouse butt
95. and it will then appear in the Load Column State menu e Load Column State contains all of the columns states you have saved Selecting a column state from here will immediately apply that state to the current table and lock the columns to maintain the new state Use Delete Column State to remove unwanted columns states from this menu Note New columns can be added to the document table by adding Meta Data to documents see 2 8 Meta Data 2 1 3 The Document Viewer Panel The Document Viewer Panel shows the contents of any document clicked on in the Document Table To view large documents it is sometimes better to double click on them This opens a view in a new window In the document viewer panel there are two tabs that are common to most types of documents Text view and Info Text view shows the document s infor mation in text format The exception to this rule occurs with PDF documents where the user needs to either click the View Document button or double click to view it 2 1 THE MAIN WINDOW 17 Some document types such as sequences trees and structures have an options panel occupying the right of the document viewer The options in the options panel have an arrow which can be used to expand or hide a group of related options See the next section on document viewers for more information about operating the various viewers in Geneious Most viewers have their own small toolbar at the top of the docum
96. and pro gram preferences A file in Geneious format will usually have a geneious extension or a xml extension This format is useful for sharing documents with other Geneious users and backing up your Geneious data Geneious Education format This is an archive containing a whole bundle of files which together comprise a Geneious ed ucation document This format can be used to create assignments for your students bioinfor matics tutorials and much more See chapter 8 for information on how to create such files GFF format The GFF format contains sequence annotation information and optional sequences You can use a GFF file to annotate existing sequences in your local database import entirely new se quences or import the annotations onto blank sequences MEGA format The MEGA format is used by MEGA Molecular Evolutionary Genetics Analysis Molecular structure Geneious imports a range of molecular structure formats These formats support showing the locations of the atoms in a molecule in 3D PDB format files from the Research Collaboratory for Structural Bioinformatics RCSB Protein Database e mol format files produced by MDL Information Systems Inc e xyz format files produced by XMol e cml format files in Chemical Markup Language e gpr format ghemical files e hin format files produced by HyperChem e nwo format files produced by NWChem 28 CHAPTER 2 RETRIEVING AND STORING DATA Newick format The
97. ared Databases The oligonucleotide Y document type is a short nucleotide sequence representing either a primer or a probe The text view lists the primer characteristics Tm GC etc These properties can be shown in the document table Tm is shown by default but you can turn on others by right clicking on the table header Oligo sequences are created via one of the following methods e Extract a primer probe annotation from a sequence e Select Sequence New Sequence from the menu and choose Primer or Probe as the type of the new sequence e Select one or more existing primer sequences maybe ones imported from a file then click Primers Convert to Oligo to transform them into oligo type sequences If you select a target sequence and go to Test with Saved Primers or Design Primers Design With Existing Geneious will find all oligo sequences in your database and offer them 4 6 PCR PRIMERS 117 as options in the list of oligo sequences with no need to select them along with the target sequence before starting the operation The meta data type Primer Info can be used to note the fridge location etc of a particular primer 4 6 3 Test with Saved Primers Primers and probes can also be quickly tested against large numbers of sequences to see which ones the primers will bind to To test primers select the target sequences you want to test for compatibility with primers and choo
98. assemblies Underneath this general annotation type list is the annotation type listing for the tracks present for the current sequence Tracks with only one annotation type will show a single listing whilst 72 CHAPTER 3 DOCUMENT VIEWERS tracks with multiple annotations will show a listing of contained annotations segregated by the annotation type Additionally the Options dropdown for the individual tracks allows for sorting and coloration of tracks by contained qualifiers Annotations Virtual Gel Lengths Graph Text View History Notes 1 BD Er extract GRC 6 Translate g Allow Editing p Add Edit Annotation Annotate amp Predict s Primer Design Save 6g 1 50 000 100 000 150 000 200 000 250 300 000 Gcm ChM a 22 9 9 Top CAE 2c a Annotations and Tracks A ot oot A m ae M Show Annotations 4 460 of 4 484 Q M CDs 560 o y lt gt mM Exon 656 o y lt gt 0 21 170 21 180 21 190 M Five_prime_UTR 8 oy lt gt ChrM AGACGATGGAATGCTATGGGATGGATGG TAGAGC M Gene 438 o y lt gt v Microsatellites Known Genes ENEE a E mRNA 536 me y Redundant G PSA M rRNA 3 of 6 gt y lt gt Prediced ORF tRNA 21 of 42 mu y lt gt ii Tracks Options Mm Microsatellites 647 gt v lt gt y TAIR10_gene MAMAS F Prediced ORF 1 554 mu lt gt m Redundant Genes 146 mum y lt gt mM Known Genes 430 Options lt gt ATMGO0060 1 TAIR SE P F m
99. atabase As with the first option you can choose which types of primers you d like to test for by select ing the checkboxes on the left Note that each primer you select will be considered in both the forward and reverse roles if you have checked both Search for Forward and Search for Reverse One final checkbox Pairs Only forces primers and probes to be considered as pairs with the probe inside otherwise they can be found anywhere in the sequence with no constraints All of the same options available for designing primers also apply to testing so if the primers are expected to bind to quite different regions of the test sequences the primer binding region may have to be extended and the target region option can be omitted By default only primers that match the target sequence exactly will be found If you wish to allow a limited number of mismatches between the primer and target sequence you can specify this under Maximum Mismatches You can limit the position in which mismatches are allowed by clicking the Mismatch Options button Click the OK button and testing will commence Once complete a dialog will present the results This dialog tells you how many of the sequences were compatible with the specified primers and probes and provides details and choices very similar to the one described in sec tion 4 6 1 The compatible primers can be annotated onto the sequences in a similar manner to that when designing primers Additi
100. bases This will only change the databases that Geneious displays and will not have any effect on the actual databases on the BLAST server 2 5 STORING DATA YOUR LOCAL DOCUMENTS 41 Sources E Stop 48 9 complete NCBI estimates 5 seconds remaining 0 of O selected Local 0 pu S F Sample Documents 0 F 3D Structures 5 3 Alignments 6 L Contig Assembly 5 D Genomes 4 x 48 9 complete Linnaeus Blast 1 Fy PA 2 Nucleotide Documents 9 NCBI estimates 5 seconds remaining 5 Plasmids From NEB 27 3 Protein Documents 5 O Restriction Enzymes 4 0 Tree Documents 4 5 3 Searches 0 p fr Megablast pygmy chimpanzee 1 0 Figure 2 8 Sequence Search in Progress 2 5 Storing data Your Local Documents Geneious can be used to store your documents locally Under the Local folder in the Services Panel you are able to create sub folders to organize and store a variety of document types Table 2 4 This is also where you can set up special folders to receive documents that are downloaded by a Geneious agent To create a new folder in Geneious select the Local folder or a sub folder icon in the services panel and right click Ctrl click on Mac OS X This will pop up a menu Clicking on New folder opens a dialog that will prompt you to name the folder The named folder is then created as a sub folder of the folder that you originally right clicked on Important
101. bers representing the proportion of symbols nucleotide or amino acid at each position in an alignment This can then be pairwise aligned to another sequence 102 CHAPTER 4 ANALYSING DATA or alignment profile When pairwise aligning profiles mismatch costs are weighted propor tional to the fraction of mismatching bases and gap introduction and gap extension costs are proportionally reduced at sites where the other profile contains some gaps In some cases building a guide tree can take a long time since it requires making a pairwise alignment between each pair of sequences The build guide tree via alignment option may speed this part by taking a different route First make a progressive multiple alignment using a random ordering and use that alignment to build the guide tree Notice that while this typically speeds up the process that may not be the case when the sequences are very distant genetically OOO Alignment Geneious Alignment MUSCLE Alignment Translation Align ClustalW Alignment Consensus Align Profile Align a Mauve Genome Cost Matrix 65 similarity 5 0 4 0 B Gap open penalty 8 Gap extension penalty 3 8 Alignment type Global alignment with free end gaps rs Automatically determine sequences direction 1 Build guide tree via alignment faster Refinement Options Refinement iterations 2 8 Restore Defaults Cancel fok gt Figure 4 3 The multiple alignment window You can
102. brackets next to the folder shows how many files are contained in that folder as well as all of its sub folders In addition if some of the documents in a folder are unread the number of unread documents will also appear in the brackets You can search the Local folder and sub folders the same way you search the public databases by clicking on the Search icon If you have defined a new type of meta data in Geneious and that meta data has been added to a document it will also be added to the Advanced Search criteria Look at an example of a new meta data type called Protein size which takes a text value for the protein in kDa kiloDaltons see Figure 2 11 Important You must use quotation marks if and blank spaces are part of your search criteria No quotation marks lead to unreliable results Wild card searches When you are looking for all matches to a partial word use the asterisk For example typing oxi would return matches such as oxidase oxidation oxido reductase and oxide This is useful for performing generic searches You can also place asterisks in the middle of the word or at the beginning This feature is available only for local documents Similarity BLAST like searching It is possible to search your local documents not only for text occurrences but by similarity to sequence fragments Click the small arrow at the bottom of the large
103. c enzymes or you can let Geneious de termine the cut sites for any candidate enzymes The latter option finds the cut sites for the candidate enzymes and generates the fragments in a single step Ligate Sequences lets you ligate two or more fragments with or without overhangs Insert into Vector allows you to choose a digested fragment or a sequence with two restriction site annotations to use as an insert and insert them into a vector circular sequence Geneious can do the work of working out what cut sites on the vector are compatible with the overhangs on the insert with some additional information from you Gibson Assembly provides a one step interface to perform a Gibson Assembly or sim ilar operation a isothermal ligase independent restriction free overlap extension PCR cloning You can chose to insert one or multiple inserts into one or multiple vectors and specify the insertion order The operation automatically creates the necessary primers and the products you will get and generates a report document TOPO cloning automatically detects TOPO vectors amongst the selected sequences and inserts the fragments into these vectors using a BLUNT TA or Directional Cloning ap proach The following sections explain the more complicated operations in a little more detail 10 1 Find Restriction Sites The option Find Restriction Sites from the Tools Cloning menu or the context menu allows you to find and a
104. can handle data from any type of sequencing machine with reads of any length including paired reads and mixtures of reads from different sequencing machines hybrid assemblies The de novo assembly algorithm used is a greedy algorithm which is similar to that used in multiple sequence alignment 1 For each sequence a blast like algorithm is used to find the closest matching sequence among all other sequences 2 The highest scoring sequence and its closest matching sequence are merged together into a contig reverse complementing if necessary This process is repeated appending se quences to contigs and joining contigs where necessary 3 For paired read de novo assembly 2 sequences with similar expected mate distances are given a higher matching score if their mates also score well against each other Similarly a sequence and its mate will be given a higher score if they both align at approximately their expected distance apart to an already formed contig The effect of this heuristic is that paired read de novo assembly starts out by finding 2 sets of paired reads and forming 2 contigs Each of these 2 contigs will contain 1 sequence from each pair and the 2 contigs are expected to be separated by the expected mate distance Assembly proceeds from there either adding new paired reads to the contigs or forming new pairs of contigs which eventually merge together Due to the nature of this algorithm paired read de novo assembly in Geneious o
105. ce and all its annotations Buttons for controlling zoom are positioned at the top of the options panel on the right of the sequence view You can also hold Alt or Ctrl and turn the mouse wheel up down to zoom in out or Alt click to zoom in or Alt Shift click to zoom out Selecting and Editing Selection and editing in the sequence viewer is very similar to standard text editing and word processing programs Click and drag to select a region You can drag up and down to select and edit across multiple sequences in an alignment Clicking the Allow Editing button enters edit mode and allows you to modify Figure 2 1 Geneious main window The Sources Panel shows a tree that concisely displays sources of data and your stored docu ments The plus symbol indicates that a folder contains sub folders A minus indicates that the folder has been expanded showing its sub folders Click these symbols to expand or contract folders Geneious Sources Panel allows you to access Your Local Documents An EMBL database Uniprot Your contacts Geneious databases NCBI databases Gene Genome Nucleotide PopSet Protein Pubmed SNP Structure and Taxonomy You can view options for any selected service with the right mouse button or by clicking the Options button at the bottom of the Sources Panel in Mac OS X 2 1 THE MAIN WINDOW 15 2 12 The Documents Table The Document Table displays your search results or your store
106. ce regardless of the consensus threshold When ignore gaps is checked the consensus is calculated as if each alignment column consisted only of the non gap characters otherwise the gap character is treated like a normal residue but mixing a gap with any other residue in the consensus always produces the total ambiguity symbol N and X for nucleotides and amino acids respectively 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 67 When the aligned sequences contain quality information in the form of chromatograms you can select Highest Quality to calculate a majority consensus that takes the relative residue quality into account This sums the total quality for each potential base call and if the total for a base exceeds 60 of the total quality for all bases then that base is called You can also choose to map the quality of the sequences onto the consensus Choose Highest to map the quality of the highest quality base at each column onto the consensus Select Total to map the sum of the contributing bases minus the sum of the non contributing bases For example if there are two G s and three A s in a column with the G s having qualities of 16 and 24 and the As having qualities of 40 42 and 50 respectively then the quality of the consensus will be 40 42 50 16 24 92 For alignments or contigs with a reference sequence the If no coverage call setting can be used to control what character the consensus sequence should us
107. ce weighting position specific gap penalties and weight matrix choice Nucleic Acids Res 22 1994 no 22 4673 4680 24 28 102
108. ce with TOPO site is selected it will print a message in this box also showing how the corresponding sequence is processed if the user clicks OK e The resulting sequences will be optionally saved in a sub folder Chapter 11 Shared Databases By using shared Databases Geneious can store your documents in your favorite relational SQL database rather than on the file system This means that multiple users can concurrently use the same synchronized storage location without any problems A shared Databases can be used for everything a local database is used for This includes collaboration Take note that unread status agents and shared folders belong to individual users rather than the database For example Bob may see a document as unread but Joe will see that same document as read if he has read it 11 1 Supported Database Systems To use a database as a shared Database Geneious requires that it support transactions with an isloation level set to SERIALIZABLE Supported databases systems include Microsoft SQL Server PostgreSQL Oracle and MySQL It is possible to use other database systems if you provide the database driver see section 11 2 1 Shared Databases have been tested using e Microsoft SQL Server 2005 Express e PostgreSQL 7 4 e Oracle 10g Express Edition e MySQL 5 173 174 CHAPTER 11 SHARED DATABASES 11 2 Setting up After a database is set up correctly multiple users can connect to it and use it as their stora
109. creates a sequence list containing copies of all of the selected sequences Lists can make it easier to manager large numbers of sequences by keeping related ones grouped in a single document e Extract Sequence from List copies each sequence out of a sequence list into a separate se quence document e Generate Mutated Sequences mutates a sequence using the EMBOSS tool msbar e Generate Shuffled Sequences randomly shuffles a sequence using the EMBOSS tool shuffle seq 2 1 9 Annotate amp Predict Menu This menu contains many tools for finding predicting and annotating regions of interest in sequences and alignments e Trim Ends See section 4 7 3 e Annotate from Database Annotates sequences with similar annotations from your database e Find ORFs Finds all open reading frames in a sequence and annotates them e Search for Motifs searches for motifs in PROSITE format Uses fuzznuc and fuzzpro from EMBOSS e Find Variations SNPs finds variable positions in assemblies and alignments 22 CHAPTER 2 RETRIEVING AND STORING DATA Find Low High Coverage finds regions with low or high read coverage in assemblies Download Annotation Tracks annotates chromosomes with tracks from the Broad Institute Search for Transcription Factors searches for transcription factors from the TRANSFAC database in a nucleotide sequence Uses tfscan from EMBOSS Predict Antigenic Regions predicts potentially antigenic regions of a protein sequenc
110. cument Viewers 3 1 General viewer controls There are several general options which are available on all viewers These can be accessed through the View menu or on the right hand side of the toolbar above the viewer H Split View Provides several options for splitting the view so that multiple views are shown simultaneously for one document When the view is split selection of annotations and regions of the sequence are synchronized across the viewers To close split views click the amp button which is also on the right of the toolbar Expand View Expands the document view panel to fill the main window by hiding the p P p y 8 sources panel on the right and the document table above Clicking this again will return the layout to it s original state O New Window Opens another view of the current document in a separate window This allows you to have several documents open at once and gives more space for viewing This can also be achieved by double clicking in the document table 3 2 The Sequence and alignment Viewer The Sequence view tab in the Document Viewer panel is available for Nucleotide sequences Protein sequences Alignments and 3D structure documents If an alignment is selected this will be called Alignment View or Contig View if a contig is selected The options available vary with the kind of sequence data being viewed 61 62 CHAPTER 3 DOCUMENT VIEWERS HEEE Annotations Dotplot Self
111. cuments to the Deleted Items folder To recover documents or folders from Deleted Items you can either move them manually to another location or use Restore from Deleted Items Put Back from Deleted Items on Mac OS in the File menu to automatically move them to folder they were deleted from The Deleted Items folder should be cleared periodically to keep hard drive space free This can be done by selecting Erase All Deleted Items from the File menu Geneious will warn you if Deleted Items contains a large amount of data 2 5 STORING DATA YOUR LOCAL DOCUMENTS 45 To erase a document immediately without moving it to Deleted Items use Erase Document Immediately in the File menu or press Shift Delete Many of these actions can also be accessed by right clicking on a folder or document 2 5 3 Document History When a document is created or modified information regarding this change is also saved This data can be viewed in the History Viewer described in section 3 8 Saving document history can be disabled for performance or privacy reasons by going to the Appearance and Behaviour tab in Preferences see section 2 9 2 5 4 Searching your Local folders The Services Panel allows you to browse your Local folder hierarchy Next to each folder name in the hierarchy is the number of documents it contains in brackets When the Local folder or a sub folder is collapsed minimized the
112. d AttB sites to a PCR product It will work on the following types of document e A PCR product AttB sites will be appended to the PCR product e A document with primer binding sites annotated If there is more than one pair Geneious will ask you which pair to use The PCR product will be extracted and AttB sites ap pended 10 4 2 Gateway This operation will perform a BP reaction or an LR reaction on the selected documents or if there are a mixture of AttB AttP and AttL AttR sites on the input documents a BP reaction on all documents with AttB AttP sites followed by an LR reaction on the results of the BP reaction and any input documents with AttL AttR sites For example to insert a PCR product with attB sites directly into a destination vector select the PCR product a donor vector and a destination vector Geneious will first produce an entry clone from the PCR product and donor vector then react this entry clone with the destination vector to produce an expression clone 10 5 GIBSON ASSEMBLY 169 10 5 Gibson Assembly The operation will generate sequences with compatible overlaps that can ligate to each other after a partial chew back with a T5 exonuclease The overlaps are created by extension overlap PCR the corresponding primers therefore will automatically be generated and displayed in a report document and as annotations on the resulting sequences To enable Gibson Assembly you have to select two or more linear sequence d
113. d documents While search results usually contain documents of a single type a local folder may contain any mixture of documents whether they are sequences publications or other types If you cannot see all of the columns in the document table you may want to close the help panel to make more room This information is presented in table form Figure 2 2 Name Summary identical Journal Title First Author PMID LJ Avirus reveals population str r reves popan ES ARTEEN GEMOJ a Science Roman Biek 16439664 http w history of its carnivore host Population genetic estimation of the loss of genetic diversity Q Population genetic estimation during horizontal transmission of HIV 1 BMC Evol Biol Charles TT 16556318 http www 2 Relaxed phylogenetics and da et ie A ANY PLoS Biol Alexei J Dru 16683862 http bic MA modified cc_cd11_M13F_COS modified cc_cd11_M13F_C05_022 ab1 Length 597 GCTCAGGA MA cc_cd11_M13F_C05_022 ab1 cc_cd11_M13F_C05_022 ab1 Length 597 gctsacgatgc MA modified cc_cd12_M13F_DOS modified cc_cd12_M13F_D05_021 ab1 Length 618 GCTSCGATG A Nucleotide alignment 6 Alignment of 2 sequences cc_cd11_M13F_C05_022 ab1 82 8 E New nucleotide sequence New nucleotied sequence A new nucleotide sequence entered ACGATCAC K 1996YangGeneticsv144p194 1996YangGeneticsv144p1941 pdf tree txt tree txt 4 tips a a a f
114. data Geneious is able to import raw data from different applications and export the results in a range of formats If you are new to bioinformatics please take the time to familiarize yourself with this chapter as there are a number of formats to be aware of 2 2 1 Importing data from the hard drive to your Local folders To import files from local disks or network drives click File Import From file This will open up a file dialog Select one or more files and click Import If Geneious automatic file format detection fails select the file type you wish to import Figure 2 3 The different file types are described in detail in the next section File name Import Files of type Pe Cancel 3 D structure documents pdb mol cml gpr hin nwo ace PHRAP file Format ace ace 1 txt Chromatogram ab abi ab1 scf Clustal aln DNA Strider str DNAStar seq pro Endnote 8 0 or 9 0 xml Fasta Autodetect Fasta Fas fa mpfa fna Fsa txt Figure 2 3 File import options 24 2 2 2 Data input formats CHAPTER 2 RETRIEVING AND STORING DATA Geneious version 7 0 can import the following file formats Format Extensions Data types Common sources BED bed Annotations UCSC Common Assembly Format caf Contigs Sequencher Clustal aln Alignments ClustalX CSFASTA osfasta Color space FASTA ABI SOLiD DNAStar seq p
115. e You can also see other users data so this is a good way to share your documents This is exactly like the normal shared Databases available with Geneious but this database is preconfigured and available as soon as you log into the server Don t try and access it any other way using the normal shared Database plugin 13 3 RUNNING JOBS AND RETRIEVING RESULTS 181 You are not currently logged in to Geneious Server Login en Log in to Geneious Server J Use SSL Port 8080 5 E Gene B Genome Nucleotide B PopSet Protein PubMed snp Structure e Taxonomy User Name Password Save password Cancel ok _ gt z _n _ Figure 13 2 Log in to Geneious Server 13 3 Running jobs and retrieving results Once you ve logged into Geneious Server many normal operations will now include an addi tional pair of buttons indicating whether the job should run on your computer or on Geneious Server Figure 13 3 Whenever you see this choice you can choose to run the job on Geneious Server If you re not logged in when you choose this Geneious will prompt you to log in The rest of the options are the same as for any local job and the job will progress in the same way as if run locally only using the remote resources provided by the server If the job is likely to complete quickly you should just run it locally but if it requires a lot of memory more than your loca
116. e us ing the method of Kolaskar and Tongaonkar Uses antigenic from EMBOSS Predict Secondary Structure uses the original Garnier Osguthorpe Robson algorithm GOR I for predicting protein secondary structure Uses garnier from EMBOSS Predict Signal Cleavage Sites predicts the site of cleavage between a signal sequence and the mature exported protein Uses sigcleave from EMBOSS Help Menu This consists of the standard Help options offered by Geneious Help shows and hides the Help panel Tutorial shows and hides the Tutorial panel Online Resourcesgives access to a variety of resources on our website Check for Updates checks for new versions of Geneious Contact Support allows you to contact our Support team through Geneious Activate License lets you activate a license or connect to a license server Install FLEXnet installs the FLEXnet licensing service which is necessary to use FLEXnet licenses Borrow Floating License lets you borrow a license from a FLEXnet server if the maintainer of the server has provided you with a Borrow File Release Licenses releases any floating license you are currently holding and returns any local FLEXnet licenses to our server so they can be activated on a difference machine Buy Online sends you to our online store About Geneious gives details about the version of Geneious you are running and licensing information 2 2 IMPORTING AND EXPORTING DATA 23 2 2 Importing and exporting
117. e column e Run Now Cause the agent to search immediately e Cancel If the agent is currently searching this can be clicked to stop the search 2 7 FILTERING AND SIMILARITY SORTING 51 e Edit Click this to change an agent s database search criteria destination or search in terval e Delete Delete the agent permanently Any documents retrieved by the agent will re main in your local documents 2 7 Filtering and Similarity sorting The Filter allows you to instantly identify documents in the document table matching chosen keywords It is located in the top right hand corner of the Main Toolbar Type in the text you are searching for and Geneious will display all the documents that match this text and hide all other documents in the Document Table To view all the documents in a folder clear the Filter box of text or click the button The Sort button in the toolbar provides two actions in a popup menu Sort by similarity is available when a single sequence document is selected in the Document Table It will rank all other sequences by their similarity to the selected sequence The most similar sequence is placed at the top and the least similar sequence at the bottom This also produces an E value column describing how similar the sequences are to the selected one The Remove Sort by Similarity action will remove the E value column and return the table to its previous sorting 2 7 1 Filtering
118. e document ie they have the same domains in the same places This operation can take a long time Get Domains in Sequence creates a domain document for every domain in a Pfam se quence document e If your domain document is a member of a Pfam clan you can use Get Clan to get a document representing that clan Get Domains in Clan will do the opposite ie get documents representing each domain in a clan e If your domain document contains the seed alignment for the domain you can use Get Full Alignment to get a domain document with the full alignment e Conversely you can use Get seed alignment to get a domain document with the seed alignment only from a domain document with the full alignment e Get Full Sequences will return the full UniProt sequence documents from which the sequences in the alignment in a domain were extracted e Get Full Sequence will return the full UniProt sequence document from which a se quence taken from an alignment in a domain was extracted 150 CHAPTER 7 PFAM Chapter 8 Geneious Education This feature allows a teacher to create interactive tutorials and exercises for their students A tutorial consists of a number of HTML pages and Geneious documents The student edits the pages and documents to answer the tutorial questions and then exports the tutorial to submit for marking 8 1 Creating a tutorial The backbone of Geneious Tutorials are the HTML documents Simply create your documents
119. e gaps C Ignore end gaps _ Highlighting f Disagreem Bl to Consensus B Go to next disagreement 38D C Complement Translation Genetic Code Invertebrate Mi 1 Relative to Consensus 1 Alt click on a sequence position or annotation or select a region to zoom in Alt shift click to zoom out Figure 3 5 Colour an alignment by Amino Acid Translation ESE Distances Text View History Notes lt a D gt Cr Extract GRC E Translate VF Allow Editing dl Add Edit Annotation gt Miko 1 10 20 30 40 50 60 TTCTTTCATGGG GAAGCAGA TTTGGGTACC AC CAAG CCCATCAACAA identity MO nl in Ce 1 Adam TTCTTTCCTAGG GAAGCAGA TTTGGGTACCTTGACTCA CCCATCAACAAC Ce 2 Harry TTCTTTCATGGG GAAGCAGA TTGGGTACC ACCCAAG CCCATCAACAA Ce 3 Saly TTCTTTCATGGGCA CGAAGCAGA TTT GGGTACC ACCCAAG CCCATCAACAA Te 4 Bob TTCTTTCATGGG CAAGA TTT GGGTACC ACCCAAGTATTGACT CCACCCATCAACAA Ce 5 Jane TTCTTTCATGGGGAACAGGCAGATTTT GGGTACC ACCCAAGTATTGACT CACCCATCAACAA 70 80 90 100 110 120 130 CCGCTATGTATT TCGTACATTA CTGCCAGCCACCATGAATATTG 1 NNNTA identity a gt Ce 1 Adam Ce 2 Harry Ce 3 Sally Ce 4 Bob Ce 5 Jane 140 150 160 170 180 190 200 CGGTACCATAAATA CTN TGACCACCT GTAGTACAT AAAAACCCAAT CCACAT CAAAACCC Figure 3 6 The identity graph for an alignment of nucleotide sequences
120. e the clade in the view Once you have selected a clade in the view 86 CHAPTER 3 DOCUMENT VIEWERS you may edit the tree see below 3 7 7 The Toolbar The buttons on the toolbar along the top of the viewer allow you to edit the tree If you are viewing a tree made from an alignment the View Sequences button allows you view the selected nodes in the sequence viewer The Root button allows you to re root the tree on the selected node The Swap Siblings button allows you to swap the position of the sibling clades of the selected node 3 8 Info Viewer The info viewer contains information about the document including notes editing history and linked documents This information is organized into three tabs properties history and lineage The notes tab allows you to add notes to your document as meta data see section 2 8 The history viewer displays the complete history for a selected document The exact informa tion displayed is flexible but is made up of entries each of which will always include the time and user responsible for the edit An entry may also reference other documents via hyperlinks and has the ability to display a recreation of the options used Saving of history can be dis abled for performance or privacy reasons by going to the Appearance and Behaviour tab in Preferences see section 2 9 The lineage view tab contains a list of the linked actively or otherwise documents in Geneious see
121. e to Geneious 1000 Megabytes Advanced Use browser connection settings H Connection settings Proxy host Proxy port Config file location Proxy Password Proxy Help C Cr GD Figure 15 2 Setting the local database location O Export selected folder Local Export the selected folder in geneious format Can be re imported in to a database 9 Archive all local data and settings Zip your entire data folder including settings Can only be loaded as an entire database Store backups on a separate hard drive if possible Restore Defaults Cox Figure 15 3 Using the backup tool 15 2 NETWORK ISSUES 193 your files from the old machine to the new one but this will break the Geneious local database because files and paths are longer than the maximum 256 bytes that Explorer handles so files will get lost The backups that Geneious produces can be safely moved so even if IT does this the data can be restored from the backup It is also necessary to release a license before moving machines This can be done from the Help menu Note that there are a limited number of releases available within a given period of time and trying to release too often may be misconstrued as a user trying to share a personal license with others Only release a license when absolutely necessary 15 1 7 Reinstalling Geneious won t erase user s data Because the Geneious installation isn t in the same pl
122. e to the location of the last local database you accessed and Geneious will switch and import the data that is there If you have 15 1 LOCAL DATABASE ISSUES 191 Name Kind E Geneious 5 2 Data Folder 20 September 2010 12 22 PM 3 Geneious 5 0 Data Folder 7 September 2010 3 06 PM O Geneious 3 5 Data Folder 26 August 2010 2 03 PM Geneious 3 0 Data Folder 26 August 2010 2 00 PM C Geneious 2 5 Data Folder 26 August 2010 1 57 PM 0 Geneious 5 1 Data Folder 17 August 2010 12 47 PM E Geneious 5 1 Data 5 Folder 11 August 2010 11 17 AM E Geneious Data Restored Folder 11 August 2010 11 15 AM C Geneious 5 1 Data_test Folder 9 August 2010 1 22 PM F Geneious 5 0 Data zip ZIP archive 22 July 2010 1 50 PM J Geneious 4 7 Data Folder 21 July 2010 10 49 AM E Geneious 4 8 Data Folder 19 July 2010 12 28 PM P Geneious 4 8 Data zip ZIP archive 28 June 2010 10 23 AM 3 DATA for Geneious Folder 27 June 2010 4 47 AM EJ Geneious5 1Data tes 9 June 2010 11 57 AM Figure 15 1 Sorting data folders by date found your data it is a good idea to use File Back Up Data to save the documents in a format that can then be loaded into Geneious again You may even want to tidy up a bit and delete old data folders if there are lots of them It would be better if you make regular backups so we encourage you to do so 15 1 5 Backing up the local database You should be aware that you need backups Due to the way the local database works it is im
123. e tree3 txt tree3 txt 1 Trees gt Figure 2 2 The document table when browsing the local folders Selecting a document in the Document Table will display its details in the Document View Panel Selecting multiple documents will show a view of all the selected documents if they are of similar types e g selecting two sequences will show both of them side by side in the sequence view The easiest way to select multiple documents is by clicking on the checkboxes down the left hand side of the table Standard keyboard controls can also be used Shift and Ctrl click Double clicking a document in the Document Table displays the same view in a separate win dow To view the actions available for any particular document or group of documents right click Ctrl click on Mac OS X on a selection of them These options vary depending on the type of document The Document Table has some useful features Editing Values can be typed into the columns of the table This is a useful way of editing the information in a document To edit a particular value first click on the document and then click on the column which you want to edit Enter the appropriate new information and press enter Certain columns cannot be edited however eg the NCBI accession number 16 CHAPTER 2 RETRIEVING AND STORING DATA Copying Column values can be copied This is a quick method of extracting searchable infor mation such as an accession number
124. e when the reference sequence has no coverage Options available are X N or Ref Seq A represents an unknown character potentially a gap If Ref Seq is selected then the consensus is assigned whatever character the reference sequence has at that position Note that if any sequence in the alignment contig has an internal gap in it that is still considered valid coverage at that position and this setting will not apply Choose Call N if quality below to change consensus bases to N s if the quality is below the thresh old that you set This is particularly useful for exporting sequences to file formats which do not preserve quality for example FASTA Highlighting When Highlight disagreements is checked the residues in the alignment that are identical to the consensus state for that column are grayed out This allows you to quickly locate variable sites in the alignment Similarly Highlight agreements greys out residues that are not indentical to the consensus allow ing you to quickly locate conserved sites in the alignments Highlight ambiguities greys out non ambiguous residues Highlight gaps greys out non gap positions Highlight transitions transversions greys out residues that are not transitions transversions com pared to the consensus sequence When highlighting transitions transversions it is recom mended you turn on the ignore gaps consensus option or some residues may be wrongly high lighted due the consensus
125. ecting this again to return to normal Split Viewer Left Right creates a second copy of the document viewer with the two views laid out side by side Split Viewer Top Bottom creates a second copy of the document viewer with one on top of the other Document Windows Lists the currently open document windows Selecting one from this menu will bring that document window to the front Tools Menu Align Assemble see section 4 4 and section 4 7 respectively Tree see section 4 5 Primers see section 4 6 Cloning see section 10 Sequence Search Perform a sequence search such as NCBI Blast using the currently selected sequence as the query See section 2 4 4 Add Remove Databases see section 5 1 3 Pfam see section 7 20 CHAPTER 2 RETRIEVING AND STORING DATA Linnaeus Blast Perform a blast search and display the results using the Linnaeus viewer Evolutionary trees are built for hits within the same species These are then displayed inside boxes nested according to the NCBI taxonomy Extract Annotations Search the selected sequences or alignments for annotations which match certain criteria then extract all of the matching annotations to separate sequence documents Includes the option to concatenate all matches in each sequence into one sequence document Useful for extracting a certain gene from a group of genomes Strip Alignment Columns
126. ed on the EMBOSS tool dotmatcher pygmy chimpanzee Tile Size 10 000 A Figure 3 8 A view of dotplot of two sequences in Geneious 80 CHAPTER 3 DOCUMENT VIEWERS 3 5 RNA DNA secondary structure fold viewer This viewer will appear when the selected nucleotide sequence is less than 3000bp long If the sequence is DNA the tab will be labelled DNA Fold and if it is RNA it will be labelled RNA Fold Figure 3 9 The fold prediction is performed by the Vienna package RNAfold tool Information on the options for this tool can be found at the following web page http www tbi univie ac at ivo RNA RNAfold html The View Options allow you to turn off on and color the bases flip the coordinates highlight the start blue and end red of the sequence and rotate the model As with other viewers you can zoom in on the model and drag the view around or use the scrollwheel using the same keyboard modifiers as the sequence viewer Selection is synchronized between the sequence view and the fold view In addition when in split view mode the fold viewer will scroll to the selected area when zoomed in By default color by probability is used where red bases are the ones with the strongest proba bility of the bases being paired with each other in paired regions or being unpaired in unpaired regions Green is the middle ground and blue is the lowest probability Color by probability is only available when
127. ee end gaps Automatically determine sequences direction amp Y More Options Cancel Co Figure 4 1 Options for nucleotide pairwise alignment Both protein and nucleotide pairwise alignments have choices for gap open gap exten sion penalties costs Unlike many alignment programs these values are not restricted to integers in Geneious The score of a pairwise alignment is matchCount x matchCost mismatchCount x mismatchCost For each gap of length n a score of gapOpenPenalty n 1 x gapExtensionPenalty is subtracted from this Where e gapOpenPenalty The gap open penalty setting in Geneious e gapExtensionPenalty The gap extension penalty setting in Geneious e matchCost The first number in the Geneious cost matrix e mismatchCost The second number in the Geneious cost matrix e matchCount The number of matching residues in the alignment e mismatchCount The number of mismatched residues in the alignment When doing a Global alignment with free end gaps gaps at either end of the alignment are not penalized when determining the optimal alignment This is especially useful if you are aligning sequence fragments that overlap slightly in their starting and ending positions e g when 100 CHAPTER 4 ANALYSING DATA using two slightly different primer pairs to extract related sequence fragments from different samples You can also do a Local Alignment if you want to allow free end overlaps rather
128. ein sequences that are structurally very similar can be evolutionarily distant This is referred to as distant homology While handling protein sequences it is important to be able to tell what a multiple sequence alignment means both structurally and evolutionarily It is not always possible to clearly identify structurally or evolutionarily homologous positions and create a single correct multiple sequence alignment 3 4 4 SEQUENCE ALIGNMENTS 101 Multiple sequence alignments can be done by hand but this requires expert knowledge of molecular sequence evolution and experience in the field Hence the need for automatic mul tiple sequence alignments based on objective criteria One way to score such an alignment would be to use a probabilistic model of sequence evolution and select the alignment that is most probable given the model of evolution While this is an attractive option there are no efficient algorithms for doing this currently available However a number of useful heuristic algorithms for multiple sequence alignment do exist Progressive pairwise alignment methods The most popular and time efficient method of multiple sequence alignment is progressive pairwise alignment The idea is very simple At each step a pairwise alignment is performed In the first step two sequences are selected and aligned The pairwise alignment is added to the mix and the two sequences are removed In subsequent steps one of three things can ha
129. en a sequence and any sequence in the contig required for the sequence to be included in the contig Overlap Identity The minimum identity in percent of the overlap region between a sequence and any sequence in the contig required for the sequence to be included in the contig Choose the options you require and click OK to begin assembling the contig Once complete one or more contigs may be generated If you got more contigs than you expect to get for the se lected sequences then you should try adjusting the options for assembly It is also possible that no contigs will be generated if no two of the selected sequences meet the overlap requirements Note The orientation of fragments will be determined automatically and they will be reverse complemented where necessary 126 CHAPTER 4 ANALYSING DATA If you already have a contig and you want to add a sequence to it or join it to another contig then just select the contig and the contig sequence and click assembly as normal Click More Options in the assembly options to display the Alignment parameters Here you can change the parameters used by Geneious when aligning fragments together For sequences which are lower quality or contain many errors the gap penalty should be decreased and the mismatch score should be increased The algorithm The sequence assembler in Geneious is flexible enough to handle read errors consisting of either incorrect bases or short indels It
130. en mistake the description for the name or wonder why their name is truncated when they imported it into Geneious when they have used spaces within what they consider the name The name must not have spaces and if it does they should be replaced with some thing like an underscore _ to keep the name as a single item The underscores can always be removed using Batch Rename once the files have been imported 15 4 2 Batch rename Often when data has been imported into Geneious the naming isn t what you expected You should try the Edit Batch Rename tool which can replace any field with combinations of other fields new text and can also perform regular expressions to achieve very complex renaming operations 15 4 3 Protein or Nucleotide Most often this happens with FASTA format since it doesn t declare what the data type is When using drag and drop Geneious tries to figure out what type of sequence it is looking at and use the correct import To be certain that you ve imported your data as the correct type though use the File Import From File and choose the format and type from the list This will avoid embarrassing issues in the case of ambiguous data 15 4 4 Word documents Sequence data should never be stored in a word processing document Word processors will do very odd things to file formats so if users want to use a document format to edit the data they should use a very simple text editor that can save
131. ent viewer panel This always has five buttons on the far right e Share which allows you to share the current visualization on Twitter Facebook or email e Split View which opens a second viewer panel of the same document Selection is synchronized between these two views e Expand Document View which expands the viewer panel out to fill the entire main window Clicking again will return the viewer to normal size e Open Document in New Window will open a new view of the selected document in a new separate window e Help opens the Help Panel and displays some short help for the current viewer 2 1 4 The Help Panel The Help Panel has a Help tab and a Tutorial tab The Help tab provides you information about the service you are currently using or the viewer you are currently viewing The help displayed in the help tab changes as you click on different services and choose different viewers The Tutorial is aimed at first time users of Geneious and has been included to provide a feel for how Geneious works It is highly recommended that you work through the tutorial if you haven t used Geneious before 2 1 5 The Toolbar The toolbar contains several icons that provide shortcuts to common functions in Geneious You can alter the contents of the toolbar to suit your own needs The icons can be displayed small or large and with or without their labels The Help icon is always available The
132. ere is a button to turn on the memory usage bar Figure 15 6 This is well worth doing as it will show how much RAM Geneious has available and how much it is using Also clicking this bar which will appear under the Sources panel will force a garbage collection freeing up memory within the JVM ano Preferences General Plugins and Features Appearance and Behavior Keyboard E NCBI Sequencing Appearance Y Small table rows Y Show labels in the toolbar Y Show large icons in the toolbar Y Show tips when no documents selected Y Show memory usage bar Save created documents To selected local folder or ask if none selected Behavior Y Select new documents when they are created Y Store history when documents are created and modified Y Bring chat windows to the front when a message is received a Reset questions that have remembered my preference ni ita Viewers _ Use old style controls in Sequence View expandable panels Save each document s view settings separately O Share the same view settings across documents of the same type Reset All Preferences Apply Cancel gt Eo ic Figure 15 6 Turn on memory usage bar While it may be tempting to allocate more memory to Geneious bear in mind that the operating system and other programs cannot use this RAM once the JVM is so if you allocate too much then everything will go much slower 15 3 GENEIOUS IS SLOW 197 To be able
133. es for the amino acids and assuming all cysteines are paired in a disulfide bridge making cystine C 62 5 only counting up to an even number W 5500 Y 1490 The following statistics are available when viewing nucleotide sequences Amino Acids and Codons Calculates the distribution of Amino Acids found by translating ac cording to the current translation options For example if By Selection or Annotation is se lected then all CDS annotations will be translated and statistics presented For codon usage statistics the frequency of all 64 codons with their associated amino acid will be displayed If any CDS contains non standard start codons then some of the 64 codons may be split into 2 entries based on whether they translate to methionine or their standard translation The following statistics are available when viewing multiple sequences Identical sites When viewing alignments or assemblies this considers only those columns in the alignment that have at least 2 nucleotides amino acids gaps that are not free end gaps and are not columns consisting entirely of gaps A column not meeting this requirement is not even counted as non identical for the percentage calculation A column meeting this requirement is considered identical if it contains no internal gaps and all the nucleotides amino acids are identical Ambiguitity characters are not interpreted so a nucleotide columm of A and R is not considered idential Pairwise Identity
134. es the formulas used to calculate the melting point of oligos Under Formula you can choose between two different tables of thermodynamic parameters and methods for melting temperature calculation e Breslauer et al 1986 http www pnas org content 83 11 3746 This is used by old versions of Primer3 until version 1 0 1 and uses the formula for melting temper ature calculation suggested by Rychlik et al 1990 e SantaLucia 1998 http www pnas org content 95 4 1460 This is the recom mended value Three different Salt Correction Formula options are available e Schildkraut and Lifson 1965 http dx doi org 10 1002 bip 360030207 This is used by old versions of Primer3 until version 1 0 1 e SantaLucia 1998 http www pnas org content 95 4 1460 This is the recom mended value e Owczarzy et al 2004 http dx doi org 10 1021 bi034621r 4 6 PCR PRIMERS 113 Characteristics The Characteristics section allows you to set absolute limits on properties of primers and probes such as melting point and GC content Optimum values can also be specified For details on individual options hover your mouse over them and a popup box will describe the function of the option Characteristics can be set for either Primers or DNA Probes depending on the task you have chosen The Primer section is available if one of Forward Primer or Reverse Primer is being designed or tested and DNA Probe is available if
135. es the gaps around so that they reads better align to each other rather than the reference sequence For further details and for a comparison of the Geneious reference assembler to other software see http desktop links geneious com assets documentation geneious GeneiousReadM paf 128 CHAPTER 4 ANALYSING DATA 4 7 2 Assembly to a reference sequence Assembling to reference is used when you have known sequence and you wish to compare a number of reads of the same sequence with it to locate differences or SNPs To perform assembly to a reference sequence select the sequences and the reference sequence and click Align Assemble and choose Map to Reference Choose the name of the sequence you wish to use as the reference in the Align to reference option and click OK One contig will be produced at most and this will display the reference sequence at the top of the alignment view with all other sequences below it See section 4 7 6 for details on identifying differences or SNPs When aligning to reference the sequences are not aligned to each other in any way each of them is instead aligned to the reference sequence independently and the pairwise alignments are combined into a contig The high medium and low sensitivity options perform a fine tuning step after the initial assembly to make reads which overlap from the initial assembly stage align better to each other If you just wish to use a reference sequence to help constr
136. esign new primers and turn off the target region and turn on the included region which should be the CDS Change the product size to be the length of the CDS Geneious tells you this in the selected value shown at the bottom of the sequence viewer in both boxes since it must be exactly the same length as the CDS Only generate 1 pair Since you re forcing the design to be in an area that may be less optimal you ll also likely have to drop the Tm minimum setting Figure 15 12 When you hit OK the primers should be designed exactly at the ends of the CDS Figure 15 13 8000 Design New Primers Select Task 4 Design New y Design with Existing Primer design uses Primer3 Please cite Primer3 if you publish results M Forward Primer _ DNA Probe M Reverse Primer Region Input Options 7 M Included Region 255 8 To 11 334 2 _ Target Region 255 1 334 Y Product Size Between 1 080 And 1 080 _ Optimal Product Size 1 Number of pairs to generate 1 gt Tm Calculation Y Characteristics Primer DNA Probe Size Min 18 15 Optimal 20 8 Max 27 B Tm Min E Optimal 60 Max 63 GC Min 20 Optimal 50 Max 80 Product Tm Min o a Optimal 0 8 Max oll Max Tm Difference 100 8 GC Clamp Pool 8 Max Dimer Tm 47 8 Max Poly X s Max 3 Stability 9 _ Allow primers inside target with penalty ol Primer Picking Weights _ Allow Dege
137. f data sets generated by sampling from one original data set The results of analyzing the sampled data sets are then combined to generate summary information about the original data set In the context of tree building resampling involves generating a series of sequence alignments by sampling columns from the original sequence alignment Each of these alignments known as pseudoreplicates is then used to build an individual phylogenetic tree A consensus tree can then be constructed by combining information from the set of generated trees or the topologies that occur can be sorted by their frequency see below 4 Bootstrapping is the statistical method of resampling with replacement To apply bootstrapping in the context of tree building each pseudo replicate is constructed by randomly sampling columns of the original alignment with replacement until an alignment of the same size is obtained 4 Jackknifing is a statistical method of numerical resampling based on deleting a portion of the original observations for each pseudo replicate A 50 jackknife randomly deletes half of the columns from the alignment to create each pseudo replicate 4 5 BUILDING PHYLOGENETIC TREES 107 4 5 6 Consensus trees A consensus tree provides an estimate for the level of support for each clade in the final tree It is built by combining clades which occurred in at least a certain percentage of the resampled trees This percentage is called the consensus su
138. g on the appropriate button The Match all any of the following option at the top of the search terms determines how these criteria are combined Match Any requires a match of one or more of your search criteria This is a broad search and results in more matches 2 4 PUBLIC DATABASES 35 Match All requires a match all of your search criteria This is a narrow search and results in fewer matches Match all of the following Author E contains HJ Drummond AJ p _ Date published is between E E OlJan2003 and 3 31 Dec 2005 Create Agent C Search Fewer Options Name Summary Choosing appropriate substitution models for the phylogenetic analysis of protein coding sequen SZ Choosing appropriate substitu Gaos Shapiro andrew Rambaut amp Alexei Drummond 2095 Mo iol Evol 2307 9 0 Tree measures and the number of segregating sites in time structured population samples Roald Forsberg Alexei J Drummond amp Jotun Hein 2004 BMC Genet 6 35 Molecular phylogeny of coleoid cephalopods Mollusca Cephalopoda using a multigene approach a effect of data partitioning on resolving phylogenies in a Bayesian framework pai L Tree measures and the numb vA Molecular phylogeny of coleoi Figure 2 6 Advanced Search 2 3 2 Autocompletion of search words Geneious remembers previously searched keywords and offers an auto complete option This works in a sim
139. g the search on a field depending on the field you are searching against For example if you are using numbers to search for Sequence length or No of nodes you can further restrict your search with the second drop down box e is greater than gt 34 CHAPTER 2 RETRIEVING AND STORING DATA e is less than lt e is greater than or equal to gt e is less than or equal to lt Likewise if you are searching on the Creation Date search field you have the following op tions e is before or on e is after or on e is between When searching your local folders you have the option of searching by Document type The second drop down list provides the options is and is not The third drop down lists the various types of documents that can be stored in Geneious such as 3D Structure Nucleotide sequence and PDF see Figure 2 5 Match all Ka of the following Document type 8 is 8 3D structure 8 M Include Subfolders Fewer Options 3 Name Summary R a Alpha Helix Alpha Helix af Cyclopropane Cyclopropane af VirusVIRAL PROTEIN VirusVIRAL PROTEIN 1G5G Figure 2 5 Document type search options And Or searches The advanced options lets you search using multiple criteria By clicking the button on right of the search term you can add another search criteria You can remove search criteria by clickin
140. ge location just as if they were using their own local database Follow these steps to set up your database for use with Geneious e Install a supported database management system if you do not already have one e Create a new database with your desired name Make sure that you have a user that has rights to create tables e Use the Connect to a database button to connect to your database If the database has not been set up usually the case if you are following these instructions Geneious will detect this and set up the database This will only succeed if you have permission to create tables on the database e Make sure any other users of the database have SELECT INSERT UPDATE and DELETE rights otherwise they will not be able to use the Shared Database as intended There are two ways you can use your database with multiple users The simple way is just to use the Shared Database as a shared local database If this is all you want then you are now done with setup Alternatively you may want to restrict access to particular folders with groups and roles To do this please refer to section 11 4 1 Your database should now be ready to use with Geneious Now all users can connect to the database by clicking on Shared Databases in the service tree and then clicking Connect to a setup database This will bring up a dialog for the user to enter in the database details 11 2 1 Supplying your own Database Driver Shared Database
141. geneious Geneious 7 0 Biomatters Ltd September 3 2013 Contents 1 Getting Started 1 1 Downloading amp Installing Geneious 2 ii be hed sc bee eee ies 1 2 Using Geneious Or ie first ime ARA ERS S RES ARS 2 Retrieving and Storing data 21 ROU Window e o do ee ERE OE Ee ES SOS OE ESS BEES BEES oe linporine ond EXPO ARA oe ee A ERA RARA A eS e 0 A IA BS EEE RS 24 Pobhicdatabasts s ie ee hh ESAS Hees Ree Oe A AAA 2 5 Storing data Your Local Documents 66 6664 ets ds a venpreni ERAS EE Ee BES RRS eee bes aS 24 Filtering and Similarity Sorting oe eh ae HE EEE OH EES 2e WS a eae ene eee hoe Se AE Bee ee Oe ee EO eg 29 PERENS e oye we Be ees ae eh ee oe ee ee ee eee eee eS 200 Denning and Saving Tees kee sb ocea Ba ma pave Pahi Ree ERS OO SILA E ERA ABARCAR A OE TERESA A 3 Document Viewers 3 1 General viewer controls 0 400 648 54 See e a a eee ee 342 The Sequence land alignment Viewer os co es oo oe ed ae Pha eG oi 23 Amnotahon Viewer ooo ds a Baw Shad Boe ee Oa kes 4 CONTENTS II 78 35 RNA DNA secondary structure fold Viewer 6 5664 chee ca 80 26 OID etructure VIEWEF 24 sea ee ee Peg ee ee ee ee we 81 Ir MOR VIGWEE coe ogee be hada e eared ea tte eee ee Bee ele 83 36 MO Viewer p fe od woren ee ae ea ee ee oe Oe ee ee esa 86 39 Parentsand Descendants oo ciar eee Ke a e Re Bee Oe Es 86 3 10 The Chromatogram viewer 4 2 2d hb MR a EEE OH EES 91 3 11 The PDF document viewer ik oe bk oe oe eo oe AAA AA
142. genies by maximum likelihood Syst Biol 52 2003 no 5 696 704 104 8 M Vingron HA Schmidt K Strimmer and A von Haeseler Tree puzzle maximum likelihood phylogenetic analysis using quartets and parallel computing Bioinformatics 18 2002 no 3 502 504 28 9 M Hasegawa H Kishino and T Yano Dating of the human ape splitting by a molecular clock of mitochondrial dna J Mol Evol 22 1985 no 2 160 174 106 10 S Henikoff and JG Henikoff Amino acid substitution matrices from protein blocks Proc Natl Acad Sci US A 89 1992 no 22 10915 10919 98 11 T Jukes and C Cantor Evolution of protein molecules pp 21 32 Academic Press New York 1969 106 12 S Kumar K Tamura and M Nei Mega3 Integrated software for molecular evolutionary ge netics analysis and sequence alignment Brief Bioinform 5 2004 no 2 150 163 31 209 210 BIBLIOGRAPHY 13 DR Maddison DL Swofford and WP Maddison Nexus an extensible file format for sys tematic information Syst Biol 46 1997 no 4 590 621 28 31 105 14 JV Maizel and RP Lenk Enhanced graphic matrix analysis of nucleic acid and protein se quences Proc Natl Acad Sci US A 78 1981 no 12 7665 9 96 97 15 C Michener and R Sokal A quantitative approach to a problem in classification Evolution 11 1957 130 162 104 105 109 16 SB Needleman and CD Wunsch A general method applicable to the search for similarities in the am
143. h primer and extension generation the user specified Tm formula is used In many cases the Phusion DNA polymerase is used for which it is recom mended to use the Tm formula of Breslauer et al 1986 http www pnas org content 83 11 3746 Primers are generated only for insert sequences supposing that the vector should stay unmod ified For this reason the extension length of the primer extending to the vector will be twice as long the full specified minimal overlap length compared to extensions on primers between two inserts who share half of the specified overlap length each For very short or long extensions primer3 might fail to calculate a Tm If the sequence is too short Formula 10 1 is used if it has a length greater than 36bp Geneious uses the Formula 10 2 Tm 2 AT 4 GC 10 1 GC 16 4 Tm 64 9 41 i ATGC 10 2 10 6 TOPO CLONING 171 A Report Document will be generated listing the generated products and primers in a tabular view Errors that occurred during the primer generation process will be reflected in that re port document Furthermore any modifications recession or maintaining overhangs adding modifications to primers are shown at the beginning of the document Geneious has a built in parent descendant tracking system Whenever a change is made to a parent sequence it will ask to propagate this change to its offsprings However in the case that the user introduced some changes in the pr
144. h aids navigation of large dotplots by showing the overall comparison and a box indicating where the dotplot window sits 4 3 2 Interpreting a Dotplot e Each axis of the plot represents a sequence e A long largely continuous diagonal indicates that the sequences are related along their entire length e Sequences with some limited regions of similarity will display short stretches of diagonal lines e Diagonals on either side of the main diagonal indicate repeat regions caused by duplica tion 4 4 SEQUENCE ALIGNMENTS 97 e A random scattering of dots reflects a lack of significant similarity These dots are caused by short sub sequences that match by chance alone For more information on dotplots refer to the paper by Maizel amp Lenk 14 4 4 Sequence Alignments Over evolutionary time related DNA or amino acid sequences diverge through the accumula tion of mutation events such as nucleotide or amino acid substitutions insertions and deletions A sequence alignment is an attempt to determine regions of homology in a set of sequences It consists of a table with one sequence per row and with each column containing homologous residues from the different sequences e g residues that are thought to have evolved from a common ancestral nucleotide amino acid If it is thought that the ancestral nucleotide amino acid got lost on the evolutionary path to one descendant sequence this sequence will show a special gap character
145. h of the fragments and the reference are of interest 4 7 1 Assembling a Contig To assemble a contig firstly select all of the sequences and or contigs you wish to assemble along with the reference sequence if you want to use one in the document table then click Align Assemble in the toolbar and choose De Novo Assemble The basic options for con tig assembly will then be displayed 609 De Novo Assemble Jata Assemble by lst part of name separated by Hyphe equence Method Sensitivity Highest Sensitivity Slow D ed 24 MB of 3 Trim Sequences Results Assembly Name Fragment Assembly n eon Save assembly report Use existing trim regions J Save list of unused reads Remove existing trim regions from sequences sed reads rae Optio Re trim sequences Save in sub folder _ Do not trim discard trim annotations M Save contigs Save consensus sequences A y More Options Cancel o Figure 4 13 Basic de novo assembly options The options available here are as follows e Assemble by aka Assemble by Name If you have selected several groups of fragments which are to be assembled separately you can specify a delimiter and an index at which 4 7 CONTIG ASSEMBLY 125 the identifier can be found in all of the names Sequences are grouped according to the identifier and each group is assembled separately If a reference sequence is specified it is used for all groups eg For the names A03 1 ab1
146. hat you can t be sure they re all searching the same database Since the Custom BLAST service access a folder on the user s hard drive it is possible to put this folder on a share and have each user point at it Their CPU will do the work but that data will be centralised It is possible that this could cause performance issues over the network though and you ll need to deal with ownership and ensure that your users don t try adding databases themselves You don t need to format the databases yourself from within Geneious but can use formatdb as normal to create BLAST databases and put them into the data folder Geneious users will then be able to see them You could also consider doing this with symlinks for some databases and then the users can create their own CustomBLAST databases while benefitting from your shared ones Note that if the database is formatted manually using format db there will be no annotations on the resulting alignments If it is formatted from within Geneious then an extra file is created with the annotations so Geneious can put them back onto the alignments after a search 15 5 3 BLASTing short sequences Users should be aware that there are issues with BLAST when searching for short sequences It is not guaranteed that it will find all occurrences of a short sequence in a database so users should not be surprised Statistically even with the word size set to 7 the minimum for DNA searches BLAST will miss 40 of
147. he Newick format 13 4 5 2 Neighbor joining In this method neighbors are defined as a pair of leaves with one node connecting them The principle of this method is to find pairs of leaves that minimize the total branch length at each stage of clustering starting with a star like tree The branch lengths and an unrooted tree topology can quickly be obtained by using this method without assuming a molecular clock 20 4 5 3 UPGMA This clustering method is based on the assumption of a molecular clock 15 It is appropriate only for a quick and dirty analysis when a rooted tree is needed and the rate of evolution is does not vary much across the branches of the tree 4 5 4 Distance models or molecular evolution models for DNA sequences The evolutionary distance between two DNA sequences can be determined under the assump tion of a particular model of nucleotide substitution The parameters of the substitution model define a rate matrix that can be used to calculate the probability of evolving from one base to another in a given period of time This section briefly discusses some of the substitution models available in Geneious Most models are variations of two sets of parameters the equilibrium frequencies and relative substitution rates Equilibrium frequencies refer to the background probability of each of the four bases A C G T in the DNA sequences This is represented as a vector of four probabilities m4 nc TG mr that sum to
148. hem as contacts 7 Add Contact to myaccount Username Name V Email A Enter Contact ID Figure 9 4 Add New Contact dialog box in searching mode Your new contact will appear immediately in your contact list however you will not be able to tell whether your new contact is online until they accept you as a contact Similarly you will occasionally see a dialog box pop up asking you Allow user name talk geneious com as contact This is another Geneious user attempting to add you as a contact in this manner Your contact will appear grey in your contact list when they are offline If your contact is online they will appear blue A contact online in Geneious will have the orange Geneious G behind them A contact online in some other program like a chat client whill have a speech bubble behind them 9 2 2 Rename Contact This option allows you to change the name that you know another contact by This is the name the contact will appear under in the contact list and in chats it is only visible to you 9 2 3 Remove Contact If you no longer wish to share documents with a contact you can remove that contact by right clicking Ctrl click on Mac OS X the contact in the Services panel and selecting Remove Contact This deletes you from their contact list as well If you find that a contact has disappeared from your list this may be the reason 158 CHAPTER 9 COLLABORATION Rename Contact 9 Name For bl
149. hen you can use the actual primer testing tool 15 6 2 Cloning Primers When designing cloning primers it is necessary for the primers to be exactly at the ends of the CDS This is essential for when doing Gateway cloning for instance To do this select the CDS 15 6 PRIMERS 203 1 Coverage al O SS AA te FD Z DCN_F p y D REV 3 DCN R p Coverage o AE e Le FuD Z DCN F p De REV 3 DCN R p 950 1 000 1 050 1 100 1 150 1 200 1 250 1 300 1 350 i i 1 i 1 1 1 1 1 Coverage Oo Ce 1 DCN gene Te FIDZ DCN_F p Ce REV 3 DCN_R p 1 400 1 450 1 500 1 550 1 600 1 650 1 700 1 750 1 800 1 850 1 i 1 i 1 i i 1 oom o te 1 DCN gene De FD Z DCN F p Ce REV 3 DCN R p ren 1 900 1 950 2 000 2 050 2 100 2 151 1 i 1 0 1 1 Figure 15 10 Using the assembler with primers 1 420 1 430 1 442 TGGAAACCTAACTGCAATGTGGATGT a CAATGTGGATGT Name reverse primer Type primer_bind_reverse Created by primer3 Length 26 Interval 1 440 gt 1 415 1 gt 26 GC 42 31 Tm 57 02 Hairpin 4 0 Primer Dimer 0 0 Product Size 1286 0 Pair Hairpin 6 0 Pair Primer Dimer 2 0 Pair Tm Diff 1 97 Sequence ACATCCACATTGCAGTTAGGTTTCCA x e 1 417 G in Figure 15 11 Reverse primer in assembly 204 CHAPTER 15 TROUBLESHOOTING you want to clone by clicking the annotation Next d
150. her testing or use is to select the annota tion for that primer and click the Extract button in the sequence viewer This will generate a separate short sequence document which just contains the primer sequence and the annota tion so it retains all the information on the primer In the case of the reverse primer it will automatically be reverse complemented ND2 CDS 4 050 4 061 F 4 089 F F 050 F Type Primer Bind primer_bind Created by primer3 284 4 2 Length 20 4 2 Interval 4 089 gt 4 108 GC 55 0 Tm 60 1 Hairpin Tm 43 6 Self Dimer Tm None Pair Dimer Tm None Sequence CCCAAGCCACAGCATCCATA Product Size 196 4 Figure 4 7 Primer design output When no primers can be found If no primers or DNA probes that match the specified criteria can be found in one or more of the sequences then a dialog is shown describing how many had no matches and for what reasons To see why no primers or DNA probes were found for particular sequences click the Details button at the bottom of the dialog The dialog will then open out to display a list of all the sequences for which no primers or DNA probes were found For each of the sequences the following information is listed 116 CHAPTER 4 ANALYSING DATA Geneious Primer Characteristics Primer3 Web Interface Primer3 Command Line GC Primer GC PRIMER_ LEFT RIGHT _GC_PERCENT Tm Primer Tm PRIMER_ LEFT RIGH
151. hich either are primer sequences or contain primer annota tions then these will be made available for selection as primers in a drop down box Selected sequences are treated as primer or probe sequences if they are 150bp in length or less For each of these tasks Generic or Cloning primers can be designed 4 6 PCR PRIMERS 111 8 0 0 Design New Primers Select Task 4 Design New F Design with Existing Please cite Primer3 if you publish results md Forward Primer DNA Probe M Reverse Primer Task gt Included Region 1 jis _ Target Region _J Product Size Between Optimal Product Size Number of pairs to generate 1 a gt Tm Calculation gt Characteristics Advanced E Cancel ok Sc Figure 4 5 The primer design dialog Generic primers This option will design standard PCR primers according to the region input options you select These options allow you to specify what part of a sequence you wish to amplify Most options are optional and can be enabled or disabled with the associated check boxes beside them If you have selected a region in the sequence before opening the primer dialog then this region will automatically be used for Included Region and Target Region All of these are expressed in base pairs from the beginning of the sequence and are as follows e Included Region Specifies the region of the sequence within which primers are allowed to fall This must surround the targe
152. hours so turn off the option to test as pairs if you don t need it Other programs have used BLAST to align primers against target sequences but this doesn t work well because BLAST is a local alignment tool so only the matching part of a primer will align so identity levels will not indicate the level of identity for the whole primer against the target sequence Also for short sequences BLAST is not reliable so it will not be able to report all possible hits Geneious has a capable short read assembler which can handle mismatches and aligns the whole short read against the reference so it is possible to use this by selecting the sequences you want to test the primers against as the reference sequences to do multiple sequences they need to be combined into a sequence list and then selecting the primer sequences as the reads then doing a medium setting assembly This will map all the primers that can match onto the references while retaining the regions that don t match Figure 15 10 Note that it will reverse complement reads that match the other strand so you need to reference the primer annotation to see the primer sequence Figure 15 11 If you want to check that primers don t match in multiple locations be sure to switch to Custom Sensitivity and turn on the Map multiple best matches option in the More Options section This method can be used as a quick screen to identify primers that will match your sequences and t
153. ides 1 per match CCTCAGC 5 2 ri 3 3 nucleotides 1 per match GC GGCCGC j 3 4 nucleotides 1 per match 11 13 CAANNNNNG 7 0 5 2 nucleotides 2 per match GGCCGGACC 8 5 4 nucleotides 1 per match RTGC GCAY y blunt 1 per match A Fewer Options Restore Defaults Figure 10 2 Digest into fragments options dialog with extended options showing be considered this can easily be used for the same effect by sorting by columns and then selecting a range of rows in the rare cases when it is needed e Otherwise if you select Digest using Enzyme Set the digestion operation includes finding the restriction sites first but without generating the annotations Therefore the options are the same as for Find Restriction Sites which is discussed in section 10 1 10 3 Insert into Vector The option Insert into Vector from the Tools Restriction Analysis menu or the context menu allows you to take an insert and insert it into a vector The insert must be one of the following e A fragment which has already been digested This fragment cannot have any restriction site annotations on it The entire fragment will be inserted into the vector Overhangs will be taken into account e A sequence with two restriction annotations The fragment resulting from digesting this sequence and discarding the fragments from the ends will be inserted into the vector The vector must be a circular sequence You do not need to annotate the rest
154. igure 3 17 Actively Linked Parents dialog Upon conclusion of your editing you will again be prompted to either deactivate links or save a copy 3 9 2 The Lineage View Every document that is linked actively or otherwise to another document has a tab called Lineage in the Info View tab The lineage view allows you view parent descendant relation ships manage links and navigate between documents Figure 3 18 All active links appear as green text whilst inactive links appear as black text and the docu ment currently being viewed and which is the root of the parents tree and the descendents tree appears in blue Each s document s name is displayed along with an icon similar to the document table denoting what type of sequence it is Also displayed in the viewer are the operations that generated each set of children along with the time at which the operation was run and the type of operation If preferred these operations can be hidden by unchecking the Show Operations checkbox providing a layout which is 90 CHAPTER 3 DOCUMENT VIEWERS Sequence View Annotations Dotplot Self Virtual Gel DNA Fold Enzymes Fragments Text View Info i a Properties M Show Operations Y Show Inactive Links Goto Export History Lineage Parents Descendants O Terr repressiple GFP generator inserted into pSB1C3 O Terr repressiple GFP generator inserted into pSB1C3 v B Restriction Cloning Today at 3 43 PM v B Extract
155. ilar way to Google or predictive text on your mobile phone If you click within the search field a drop down box will appear showing previously used options 2 4 Public databases Geneious allows you to search several public databases in the same way that you can search your local documents The search process is described in section 2 3 Geneious is able to communicate with a number of public databases hosted by the National Centre for Biotechnology Information NCBI as well as the UniProt and Pfam databases You can access these databases through the web at http www ncbi nlm nih gov http www uniprot org and http www sanger ac uk Software Pfam respectively These are all well known and widely used storehouses of molecular biology data When viewing data from a public database such as NCBI the data can not be modified This is demonstrated by the small padlock icon which appears in the status bar When this icon is present items cannot be added or removed from the table and they cannot be modified in any way To modify an item you must first move it to your local folders 36 CHAPTER 2 RETRIEVING AND STORING DATA 2 4 1 Pfam See chapter 7 2 4 2 UniProt This database is a comprehensive catalogue of protein data It includes protein sequences and functions from Swiss Prot TrEMBL and PIR 2 4 3 NCBI Entrez databases NCBI was established in 1988 as a public resource for information on molecular biology Geneious allo
156. imer binding region or the extension region of the original sequences these changes won t be reflected in the report document 10 6 TOPO Cloning TOPO Cloning lets you ligate a single fragment into a Vector within only 5 minutes using the natural activity of Topoisomerase I which recognizes a specific motif 5 C T CCTT 3 on the DNA TOPO is a registered trademark of Invitrogen Corporation With the option TOPO Cloning you can insert linear fragments into either linear TOPO vectors when a TOPO site is present at the extremities or into circular TOPO vectors You can select as many sequences at once as you like they will be ligate into each other in a batch operation TOPO Cloning _ TA Cloning Blunt Cloning 9 Directional with overlap CACC Vectors Vector A Vector B All other sequences are treated as inserts and will be inserted into each vector Y Save in sub folder TOPO Results Figure 10 5 TOPO Cloning options dialog e Three different options TA Blunt or Directional cloning are shown on the top If Direc 172 CHAPTER 10 CLONING tional is selected the user can define an overlap sequence If this field is blank it has the same effect as Blunt cloning e The field below shows which of the selected sequences have been detected as vectors all other sequences are inserts e If any complications occur eg when more than one TOPO site is detected or when a linear sequen
157. imes from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology Here we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo MCMC integration The Kingman coalescent model is used to describe the time structure of the ancestral tree We recover information about the unknown true ancestral coalescent tree population size and the overall mutation rate from temporally spaced data that is from nucleotide sequences gathered at different times from different individuals in an evolving haploid population We briefly discuss the methodological implications and show what can be inferred in various practically relevant states of prior knowledge We develop extensions for exponentially growing population size and joint estimation of substitution model parameters We illustrate some of the important features of this approach on a genealogy of HIV 1 envelope env partial sequences PMID 12136032 Figure 3 21 Viewing bibliographic information in Geneious 93 94 CHAPTER 3 DOCUMENT VIEWERS Chapter 4 Analysing Data 4 1 Literature Geneious allows you to search for relevant literature in NCBI s PubMed database The results of this search are summarized in c
158. in Geneious 3 6 1 Structure View Manipulation e Click and drag the mouse to rotate the structure e Hold the Alt or Shift key then click and drag to zoom in out Hold the Ctrl key then right click and drag to pan or if you are using a Mac click and hold press Ctrl and Alt Option then drag to pan 3 6 2 Selection Controls To the right of the structure are controls that let you control the selected part of the structure 82 CHAPTER 3 DOCUMENT VIEWERS e If the structure you are viewing contains more than one model the model combo box will you choose between them The select button lets you select all none or the nonselected region of the structure as well as by element group type or secondary structure e The highlight selected checkbox lets you select whether to highlight the selected atoms in the structure view e The structure tree shows the atoms in the structure in a tree format Click on regions in the tree to select thoses regions You can also Shift click and Ctrl click to select mutliple regions at once e The command box lets you type in arbitrary jmol scripting commands To see some exam ples select one of the pre populated options in the box s drop down For a complete de scription of the commands you can use see http www stolaf edu academics chemapps jmol docs 3 6 3 Display Menu At the top of the viewer is the display menu Here you can modify the appearance of the structure e Reset lets
159. in the sequence viewer or they can be grouped logically into tracks A track is a collection of one or more annotation types Tracks are stacked vertically underneath the sequence in question with a separate line for each track and its annotations By clicking on the name of an annotation in the sequence view annotations can be colored by the contents of a qualifier field This enables the creation of annotation heatmaps by using a score value or some other metric stored in the qualifier of an annotation In the presence of annotations and tracks the options panel includes the Annotation Types section Figure 3 7 Uncheck the top check box to turn off all annotations Directly beneath the top check box is a filter text field Typing a term in this field will highlight any annotations that contain the entered text in their name or qualifiers Annotations that are either directly annotated on the sequence or are present in multiple tracks are shown below the filter text field and have an options popup Clicking on the preview of the annotation arrow allows you to further customise the way each type is displayed group annotation types under new tracks as well as delete all annotations of a particular type Additionally on the right of the popup button are two small left right buttons which will move the selection in the sequence view to the next or previous instance of each annotation type This is useful for navigating large genomes or
160. indicate whether each position is covered by reads in both direc tions Green is used for regions with reads in both directions and yellow is used for regions with reads in one direction only The scale bar shows minimum and maximum coverage as well as a tick somewhere in between for the mean coverage Sequence Logo This is available for sequence alignments It displays a sequence logo where the height of the logo at each site is equal to the total information at that site and the height of each symbol in the logo is proportional to its contribution to the information content When zoomed out far enough such that he horizontal width of each site is less than one pixel then the height is the average of the information over multiple sites When gaps occur at at some sites the height is scaled down further to be proportional in height to the number of non gap residues Amino Acid Charge This is available for protein sequences It runs the EMBOSS charge tool to plot a graph of the charges of the amino acids within a window of specified length as the window is moved along the sequence Hydrophobicity This is available with protein sequences It displays the Hydrophobicity of the residue at every position or the average Hydrophobicity when there are multiple sequences pl pI stands for Isoelectric point and refers to the pH at which a molecule carries no net elec trical charge The pI plot displays the pI of the protein at every position along
161. ing Started One of the best ways to get an introduction to Geneious its features and how to use them is to watch our online video demonstration http desktop links geneious com demonstration 1 11 Downloading amp Installing Geneious Geneious is free to download from http desktop links geneious com download If you are using Geneious for the first time you will be offered a free trial If you have already purchased a license you can enter it when Geneious starts up To download Geneious click on the internet address above or type it in to your internet browser to open the Geneious download page enter your details then choose your operat ing system and click Download Then choose the version of Geneious you want to download and click Download again Geneious has some minimum system requirements It is compatible with the three most com mon operating systems Windows Mac and Linux Check that you have one of the following OS versions before you launch Geneious Operating System System requirements Windows XP Vista 7 8 Mac OS 10 6 10 7 10 8 Linux Geneious also needs Java 1 6 or higher to run If you do not have this on your system already please download a version of Geneious that includes Java This involves downloading a larger file 8 CHAPTER 1 GETTING STARTED Once Geneious has downloaded double left click on the Geneious icon to start installing the program While this is happening you wil
162. ino acid sequence of two proteins J Mol Biol 48 1970 no 3 443 53 97 98 17 C Notredame DG Higgins and J Heringa T coffee A novel method for fast and accurate multiple sequence alignment J Mol Biol 302 2000 no 1 205 217 26 18 RJ Roberts T Vincze J Posfai and D Macelis Rebase enzymes and genes for dna restriction and modification Nucl Acids Res 35 2007 D269 D270 161 19 F Ronquist and JP Huelsenbeck Mrbayes 3 Bayesian phylogenetic inference under mixed models Bioinformatics 19 2003 no 12 1572 4 104 20 N Saitou and M Nei The neighbor joining method a new method for reconstructing phyloge netic trees Mol Biol Evol 4 1987 no 4 406 25 104 105 109 21 TF Smith and MS Waterman Identification of common molecular subsequences Journal of Molecular Biology 147 1981 195 197 97 98 22 K Tamura and M Nei Estimation of the number of nucleotide substitutions in the control region of mitochondrial dna in humans and chimpanzees Mol Biol Evol 10 1993 no 3 512 526 106 23 JD Thompson TJ Gibson F Plewniak F Jeanmougin and DG Higgins The clustal x windows interface flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25 1997 no 24 4876 4882 24 26 28 101 102 24 JD Thompson DG Higgins and TJ Gibson Clustal w improving the sensitivity of progres sive multiple sequence alignment through sequen
163. ion The PopSet database contains both nucleotide and protein sequence data and can be used to analyze the evolutionary relatedness of a population The Entrez Protein database This database contains sequence data from the translated coding regions from DNA sequences in GenBank EMBL and DDBJ as well as protein sequences sub mitted to the Protein Information Resource PIR SWISS PROT Protein Research Foundation PRF and Protein Data Bank PDB sequences from solved structures The Entrez Structure database This is NCBI s structure database and is also called MMDB Molecular Modeling Database It contains three dimensional biomolecular experimentally or programmatically determined structures obtained from the Protein Data Bank The PubMed database This is a service of the U S National Library of Medicine that includes over 16 million citations from MEDLINE and other life science journals This archive of biomed ical articles dates back to the 1950s PubMed includes links to full text articles and other related resources with the exception of those journals that need licenses to access their most recent issues Entrez Taxonomy This database contains the names of all organisms that are represented in the NCBI genetic database Each organism must be represented by at least one nucleotide or protein sequence Entrez Gene Entrez Gene is NCBI s database for gene specific information It does not include all known or predicted genes in
164. ion to Weight by quality This is very useful for identifying low quality regions and resolving conflicts l E R C Translate Allow Editing gt iad 600 800 1 000 1 200 1 400 1 612 1 i f 1 1 i Consensus tit Sequence all Ce FWD 2 Frag a 1 a D REV 3 Frag a 1 Figure 4 15 The overview of a contig Finding disagreements or SNPs To easily identify bases which do not match the consensus turn on Highlight Disagreements in the consensus section of the sequence viewer options When this is on any base in the sequences which matches the consensus at that position is grayed out and bases not matching are left colored With this on you can quickly jump to each disagreement by pressing Ctrl D D on Mac OS X or by clicking the Next Disagreement button in the sequence viewer option panel to the right Each disagreement can then be examined or resolved You can also use this feature If you have aligned to a reference sequence and you are interested in finding differences between each sequence and the reference or SNPs Manually investigating every little disagreement can be time consuming on larger contigs There is also a Find Variations SNPs feature from the Annotate amp Predict toolbar which will annotate regions of disagreement and it can be configured to only find disagreements 4 7 CONTIG ASSEMBLY 133 above a minimum threshold to screen out disagreements due to
165. is should be used when sequences have no quality information attached e Trim 5 End and Trim 3 End These can be set to specify trimming of only the 3 or 5 end of the sequence A minimum amount that must be trimmed from each end can also be specified e Maximum length after trim If the untrimmed region is longer than the specified limit then the remainder will be trimmed from the 3 end of the sequence until it is this length 4 7 4 Using paired reads To assemble paired read or mate pair data prior to assembly you first need to tell Geneious the reads are paired and then the assembler will automatically used the paired data unless you turn off the advanced option to Use paired distances To set up paired reads you need to select the document s containing the paired reads and select Set Paired Reads from the se quence menu Depending on your data source reads could be in parallel sets of sequences or interlaced so you need to tell Geneious which format Geneious will guess and select the ap propriate option based on the data you have selected so most of the time you can just use the default value for this However you must make sure you select the correct Relative Orienta tion for your data Different sequencing technologies orientate their paired reads differently All paired read data will have a known expected distance between each pair It is important you set this to the correct value to achieve good
166. it is to the query sequence The bigger the bit score the better the match Finally there is also an E value or Expect value which represents the number of hits with at least this score that you would expect purely by chance given the size of the database and query sequence The lower the E value the more likely that the hit is real Geneious can perform seven different kinds of BLAST search e blastn Compares a nucleotide query sequence against a nucleotide sequence database e Megablast A variation on blastn that is faster but only finds matches with high similarity e Discontiguous Megablast A variation on blastn that is slower but more sensitive It will find more dissimilar matches so it is ideal for cross species comparison e blastp Compares an amino acid query sequence against a protein sequence database e blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database You could use this option to find potential translation prod ucts of an unknown nucleotide sequence e tblastn Compares a protein query sequence against a nucleotide sequence database dy namically translated in all reading frames e tblastx Compares the six frame translations of a nucleotide query sequence against the six frame translations of a nucleotide sequence database Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is too computation
167. l alignments the Smith Waterman algorithm 21 is the most commonly used See the references provided for further information on these algorithms Pairwise alignment in Geneious A dotplot is a comparison of two sequences A pairwise alignment is another such comparison with the aim of identifying which regions of two sequences are related by common ancestry and which regions of the sequences have been subjected to insertions deletions and substitu tions The options available for the alignment cost matrix will depend on the kind of sequence e Protein sequences have a choice of PAM 2 and BLOSUM 10 matrices e Nucleotide sequences have choices for a pair of match mismatch costs Some scores distinguish between two types of mismatches transition and transversion Transitions A G C gt T generally occur more frequently than transversions Differences in the ratio of transversions and transversions result in various models of substitution When applicable Geneious indicates the target sequence similarity for the alignment scores i e the amount of similarity between the sequences for which those scores are optimal 4 4 SEQUENCE ALIGNMENTS 99 e00 Pairwise Multiple Align Geneious Alignment MUSCLE Alignment Realign Region Translation Align A ClustalW Alignment Profile Aligr Cost Matrix 65 similarity 5 0 4 0 Gap open penalty 12 Gap extension penalty 311 Alignment type Global alignment with fr
168. l be prompted for a location to install Geneious Please check that you are satisfied with the location before continuing If you are using Mac OS X you will only have to double click on the disk image that is down loaded then drag the Geneious application to your Applications folder Don t run Geneious from the mounted disk image as there are no write permissions on this You must drag the icon into your Applications folder and run it from there 1 1 1 Choosing where to store your data When Geneious first starts up you will be asked to choose a location where Geneious will store all of your data The default is normally fine Although it s possible to store your data on a network or USB drive so you can access it from other computers this is not recommended because it can have adverse effects on performance Please do not use a DropBox folder to store your data This may corrupt your data To store your data somewhere different to the default simply click the Select button in the welcome window and choose an empty folder on your drive where you would like to store your data The data location can also be changed later by going to the General tab under Tools Preferences in the menu and changing the Data Storage Location option Geneious will offer to copy your existing data across to the new location if appropriate 1 1 2 Upgrading to new versions To upgrade existing Geneious installations simply down
169. l bring up a window similar to that displayed in figure 2 16 Add Meta Data A Save fa El Primer Info ource Edit meta data types Figure 2 15 Edit Meta Data Types Creating Meta Data Types Geneious does not restrict you to the meta data types that it comes with You can create your own types to store any information you want To create a new type click on the Create button in the left hand panel of the Edit Meta Data Types window This creates a new type with one empty field and displays it in the panel to the right Note The Name and Description fields distinguish your meta data type from other user defined types They do not have any constraints Next you need to decide what values your Meta Data Type will store by specifying its fields Field name This defines what the field will be called It will be displayed alongside columns such as Description and Creation Date in the Documents Table You can have more than one Field in a single Meta Data Type to add or remove a field from the type click the or buttons to the right of the field Field type This describes the kind of information that the column contains such as Text Integer and True False The full list of choices in Geneious is shown in figure 2 16 54 CHAPTER 2 RETRIEVING AND STORING DATA 000 Edit Meta data Types Existing Types Primer Info Name Source Source Description Depth Decima Constraints _ LatLong Text
170. l covered by one such polylinker annotation directly Bases Used to explicitly specify the range of bases to use Entire sequence Used to specify that you can cut anywhere within the sequence e Candidate Enzymes These options let you choose which enzymes to look for on the vector sequence Enzymes annotated on insert This option lets you use only the enzymes used to cut the insert fragment Enzyme set This option lets you use the enzymes from a predefined enzyme set eg the enzyme set you have created containing the enzymes you have in your lab e Cut vector with Whenever you change the options for the polylinker or candidate en zymes Geneious will recalculate the compatible enzymes on the vector It will look for enzymes which meet one of the following conditions in addition to cutting only within the polylinker and belonging to enzymes from the candidate enzyme set 1 A single enzyme which cuts the vector once such that the insert can be inserted in the gap Possible only when the insert has complementary cut sites 2 A single enzyme which cuts the vector twice such that the insert can be inserted into the gap vacated by the fragment between the two cut sites 3 Two enzymes which each cut the vector once such that the insert can be inserted into the gap vacated by the fragment between the two cut sites 10 3 INSERT INTO VECTOR Insert into Vector Insert Insert forward Insert reverse 689 693 1793
171. l machine has for instance or if it will take a long time to process you should choose to run it on the server You can check the status of your job in the operations table in Geneious and you can also shut Geneious down once your job has been submitted to the server and if the job has completed when you log back in you ll be able to retrieve your results If your jobs were running when you shut down Geneious will request progress from the server when you restart and either show you your completed jobs or show you the progress dialogue so you can see how far along the job has gone Figure 13 4 182 CHAPTER 13 GENEIOUS SERVER A Geneious Alignment A MUSCLE Alignment ClustalW Alignment Realign Region Translation Align A MAFFT Alignment 4 Consensus Align A Profile Align O mauve Genome LASTZ Alignment Run C On my computer on Geneious Server 65 similarity 5 0 4 0 E 8 Global alignment with free end gaps Cost Matrix Gap open penalty 12 Gap extension penalty 3 M Automatically determine sequences direction Alignment type Build guide tree via alignment faster Refinement Options Refinement iterations 2 8 C Create an alignment without actually aligning the sequences 1 A Fewer Options Cancel o Figure 13 3 Log in to Geneious Server M Delete Started T 23 Nov 2011 1 29 PM 4 22 Nov 2011 4 26 PM Finished 22 Nov 2011 4 25 PM Finished 22 Nov 2011 3
172. lete it from inside Geneious to keep the size of your database down and improve the performance of Geneious You should keep archive backups in addition to these because this backup will miss your settings and data outside the selected folder e Archive all data and settings This is equivalent to creating a zip archive of your entire Geneious data directory which includes all your data preferences searches and agents This type of backup cannot be directly imported in to an existing database when it is loaded everything in Geneious will revert to how it was when you took the backup 2 11 1 Restoring a backup Geneious format backup The easiest way to restore this is to drag and drop the Geneious file in to the folder in Geneious where you wantitto go Alternatively you can use Restore Backup in the File menu and the backup will be added under the Local folder in your current database e Archive all data and settings It is strongly recommended that you use Restore Backup in the file menu to load the zip file rather than unzipping it manually Some operating systems may not be able to unzip the data correctly The Restore Backup command will unzip your backed up data directory to a folder of your choosing which you can then load immediately If you choose not to load it immediately you can switch to the restored data directory by going to Preferences in the Tools menu and changing the Data Storage Location on the General tab Chapter 3 Do
173. lication because they won t become pixelated Raster formats PNG and JPG are easier to share great for emailing posting on the web If you plan to use the image in Microsoft Office then EMF format is recommended Microsoft Office for Mac can t ungroup EMF files like the Windows version can unfortunately LibreOffice for Mac Windows or Linux can and allows you to edit the individual elements Resolution Only applies to raster formats PNG and JPG and is used to increase the number of pixels in the saved image 60 CHAPTER 2 RETRIEVING AND STORING DATA 2 11 Back up It is important to keep frequent back ups of your data because computers can fail suddenly and unexpectedly A computer can be replaced but your data is much harder to replace The best way to back up all of your data and settings in Geneious is to use the Back Up button in the toolbar or select Back Up Data in the File menu Backing up your data directory manually is not recommended because the Geneious database structure is complex and many programs will fail to back it up properly The back up command has two options Export selected folder This will export the selected folder including all subfolders to a Geneious format file This allows you to back up an individual project within your database The backup can also be imported in to an existing database by drag and drop If you have finished working on a project itis a good idea to back it up in this way then de
174. load and install the new to the same location This will retain all your data 1 2 Using Geneious for the first time Figure 1 1 shows the main Geneious window This has six important areas or panels 1 2 1 The Sources Panel The Sources Panel contains the service Geneious offers for storing and retrieving data These in clude your local documents including sample documents Shared Databases UniProt NCBI Pfam and Collaboration All these services will be described in detail later in the manual For more information see section 2 1 1 1 2 USING GENEIOUS FOR THE FIRST TIME amp 4 4 OP Back Forward Sequence Search Agents Align Assemble Tree Primers Cloning BackUp Support Help 1 of 6 selected v Local 0 M Name A Description FR w amp Sample Documents 0 3D Structures 5 Contig Assembly 5 Genomes 4 Linnaeus Blast 1 ents Alignments 6 Mm COXII CDS Multiple alignment of 51 Cytochrome C C Pairwise protein it of peptidase from kiv People Document fof5 sequences from 4 PFam B_7 domail Three Kingdoms it of Alanyl tRNA synthe Transcript variants Multiple alignment of 4 variants of MAPK Tree Documents 4 B Deleted Items 0 Searches 0 Shared Databases BD O Extract GRC 6 Translate Sources Panel a Hide Restriction Enzymes 2 Alignment View Distances Text View Info
175. ltiple sequences in an alignment Clicking the Allow Editing button enters edit mode and allows you to modify Figure 1 1 The main window in Geneious 10 CHAPTER 1 GETTING STARTED 1 2 2 The Document Table The Document Table displays summaries of downloaded data such as DNA sequences protein sequences journal articles sequence alignments and trees By clicking on the search icon you can search data for text or by sequence similarity BLAST You can enter a search string into the Filter box located at the right side of the toolbar this will hide all documents that do not contain the search string For more information see section 2 1 2 1 2 3 The Document Viewer Panel The Document Viewer Panel is where sequences alignments trees 3D structures journal ar ticle abstracts and other types of documents can be shown graphically or as plain text Many document viewers allow you to customize settings such as zoom level color schemes layout and annotations nucleotide and amino acid sequences three different layouts branch and leaf labeling tree documents and many more When viewing journal articles this panel in cludes direct link to Google Scholar All these options are displayed on the right hand side of the panel Figure 1 2 For more information see section 2 1 3 P Eero ER a Nucleotide sequence b Journal Article c Phylogenetic tree Figure 1 2 Three document viewers
176. mma Tab Separated Values documents You can either import them from the Import From File menu or simply paste the contents of the document into Geneious When Geneious has successfully recognized the file as CSV or TSV you will see the following dialogue Figure 4 12 amp Import Sequences 5 Import Type Primer X Determine Characteristics Options as v Top row values are column headings Primers Tag No Tag Primer Tag Primer Fish_1651F 1 AACCGA GACGAKAA AACCGAGA Fish_1651F 2 TAGAGC GACGAKAA TAGAGCGA Fish_1651F 3 GAAGAG GACGAKAA GAAGAGGA 3 r Name Primers column 1 X Sequence Primer column 4 X Description X Primer Extension Tag column 3 v Additional Fields Organism X Common Name X Taxonomy X Topology y Genetic Code v Molecule Type v Accession e Created E X Noto Note Type de Reset to Defaults ok cancel Figure 4 12 Importing primers from a spreadsheet You will be asked which type of sequence you are importing When you choose to import primers or probes you will receive some options that allow you to determine characteristics for them as an extra step Immediately below this is a preview of the first few rows of data and a checkbox that allows you to tell Geneious that the top row is a heading row and should be ignored Below the preview is a list of common and additional fields along with dropdown boxes These 4 7 CON
177. n s conditions exactly and afterwards matches up the newly regenerated child documents with any former children and replace their contents where possible Occasionally one or more of the parent documents has been altered to a point where an op eration can no longer be rerun or a necessary parent document is inaccessible In this case Geneious will inform you of the failure and attempt to be as specific as possible about the cause of the failure Figure 3 13 x es parent document was successfully saved The failed operation s have been changed O Descendant changes were not successfully applied for 1 operation s although the to inactive links Cause Cannot identify forward primer to use in extraction For help with this issue click Show Details then contact support with the details included Y Show details Figure 3 13 Failure to propagate an Extract PCR product operation due to a missing forward primer Inactive links do not propagate changes from parent to child Inactive links are created in two different ways firstly when you choose not to propagate changes that active link becomes temporarily inactive Secondly if an operation does not support creation of active links or was told not to create them all links between its parents and children will be permanently inactive All operations in Geneious at least create inactive links The following operations in Geneious can produce actively linked doc
178. n used to generate a 3D model which is usually viewed with Rasmol or SPDB viewer Geneious can read PDB format files and display an interactive 3D view of the protein structure including support for displaying the protein s secondary structure when the appropriate information is available PDF format PDF stands for Portable Document Format and is developed and distributed by Adobe Systems http www adobe com It contains the entire description of a document including text fonts graphics colors links and images The advantage of PDF files is that they look the same regardless of the software used to create them Some word processors are able to export a document into PDF format Alternatively Adobe Writer can be used Currently you can use Geneious to read store and open PDF files and future versions will have more options for storing and manipulating PDF 2 2 IMPORTING AND EXPORTING DATA 29 Phrap Ace files Ace is the format used by the Phrap Consed package created by the University of Washington Genome Center This package is used mainly to assemble sequences PileUp format The PileUp format is used by the pileup program a part of the Genetics Computer Group GCG Wisconsin Package PIR NBRF format Format used by the Protein Information Resource a database established by the National Biomedical Research Foundation Qual file Quality file which must be in the same folder as the sequence file FASTA format for the
179. nce To manage agents click on the agent icon in the toolbar An agent has to be set up before it can be used 2 6 1 Creating agents To set up an Agent click the Agents icon and the create button You now need to specify a set of search criteria in the exact same way as you do for search the database to search search frequency and the folder you wish the agent to deliver its results to The search frequency may be specified in minutes hours days or weeks You can only use whole numbers Selecting Only get documents created after today will cause the agent to check what docu ments are currently available when the agent is created Then when the agent searches it will only get documents that are new since it was created e g If you have already read all publi cations by a particular author and you want the agent to only get publications released in the future Alternatively you can click the Create Agent button which is available in some advanced search panels This will use the advanced search options you have entered to create the agent The easiest way to organize your search results is to create a new folder and name it appropri ately You can do that by navigating to the parent folder in the Deliver to box and click New Folder or by creating a new folder beforehand 1 Right click Ctrl click on Mac OS X on the Sample Documents or Local folders This brings up a popup menu with a New Folde
180. nd update descendants Deactivate links to descendants so they are not updated Save as copy without descendants _save_ _ canci Figure 3 15 Actively Linked Descendants dialog In order to aid with your decision making the dialog allows you to view the document s de scendants in a smaller cut down version of the Lineage View Pressing the View Descendants button will bring up this view Figure 3 16 When you choose to begin editing a document with actively linked parents in the Sequence View you will immediately be warned that in order to save your changes you will need to 3 9 PARENTS AND DESCENDANTS 89 ic Descendants of first E first B Ligate Sequences Today at 1 56 PM cf Ligated sequences Figure 3 16 Descendants view deactivate this link Similarly to the Actively Linked Descendants view you will be given the opportunity to view the document s lineage Editing a document that is a descendant of other documents is usually unintentional however in some circumstances you may simply be interested in the output documents of an operation not the parent descendant relationship and as such you may hide this dialog Figure 3 17 Actively Linked Parents Se This document has an actively linked parent View Parents If you edit it and save your changes you will need to either deactivate the link s or save your changes to a copy Continue Editing Cancel F
181. neious only uses the Admin role for the Everybody group By default there is only one group the Everybody group When a user logs in for the first time Geneious will put them into the Everybody group with a role of Edit So this means every user of the shared Database belongs to this group with a role of Edit unless you enter them into the g user table beforehand You will want to give yourself the role of Admin for the Everybody group if you want to perform administrative functions within Geneious Unfortunately at this time there is no interface for assigning groups and roles to users So you will need some knowledge of SQL in order to take advantage of this feature You can create groups by adding entries into the g_group table in the database Assign users groups and roles in the table g_user_group _role It is likely that if you are running in a multi user environment and taking advantage of groups and roles you will want to give only read access of the table g_user_group_role to your users This is so your users can not edit this table with SQL directly as you would do You will also want to add all of your users into g user manually so Geneious does not think that they are first time users and fail trying to insert them into the Everybody group due to read only access 176 CHAPTER 11 SHARED DATABASES Chapter 12 Licensing The Help menu contains a number
182. neious to install Mauve plugin Download the MrBayes plugin file Admin Console Click to access administrative console if you are a server administrator Figure 13 1 Download Geneious Server Plugins Click on each plugin to download it and once you ve downloaded all plugins drag them from your downloads folder into Geneious You ll probably have to restart Geneious after all plugins have been installed Note that it may take some time for the plugins to install so give it some time Once it is clear the plugins have all installed restart and when Geneious comes back up you should now see the Geneious Server link in the Sources Panel Click this and you ll see a button to log in Use the log in button to display a dialogue requiring the hostname username and password details which your administrator should have provided you with Figure 13 2 Once you ve logged into the server you will now have access to the shared database space which will appear under Shared Databases in the sources panel We recommend you create a folder for your own documents The benefits of this folder is that the server can see anything in there without having to get it from your Geneious client This means large documents such as NGS sequencing data can be placed in here and the server will be able to quickly access it Also if you log into the server from another machine documents you put in the shared Database will be available unlike those of your local databas
183. neracy 1 Cone Figure 15 12 Design cloning primers 15 7 ASSEMBLER 205 forward primer Figure 15 13 Finished cloning primers 15 6 3 Reverse complement primers Primers are always 5 to 3 so in Geneious if you reverse complement a primer the sequence viewer will show the other strand and the primer direction arrow will switch from left to right to right to left In the text view you should see that the primer hasn t actually changed and is still the original sequence If you really want to switch the primer to the other strand it needs to be run through Convert to Oligo again since the annotated primer now doesn t correspond to the sequence It is worth deleting the current primer annotations and then running the Convert to Oligo tool which will create a new primer annotation running from left to right which does correspond to the sequence as it now exists This will create a sequence list which contains the primer sequences although they won t cur rently be oligos but you can then extract the sequences from the list and convert them to oligos using the Primers Convert to Oligo operation They will now be available as part of the primer database 15 7 Assembler The assembler in Geneious has been written to be fast and memory efficient to allow it to handle next gen data Here are some tips and tricks 206 CHAPTER 15 TROUBLESHOOTING 15 7 1 Trimming Trimming in Geneious can be s
184. nly works well if you have high coverage of paired reads a hybrid assembly of mostly unpaired data with a few paired reads will not make good use of the paired read data but this is expected to improve in future versions 4 Each contig generated by a gapped de novo assembly has some minor fine tuning per formed on it both during assembly and upon completion For each gapped position in a sequence a base adjacent to the gap is shuffled along into the gap if it is the same base as the most common base in other sequences in the contig at that position After doing this if any column now consists entirely of gaps that column is removed from the contig 5 Other minor heuristics are applied throughout the assembly to improve the results 4 7 CONTIG ASSEMBLY 127 6 Both the Geneious de novo and reference assemblers use a deterministic method even when spreading the work cross multiple CPUs such that if you rerun the assembler using the same settings and same input data it will always produce the same results The reference assembly algorithm used is a seed and expand style mapper followed by an op tional fine tuning step to better align reads around indels to each other rather than the reference sequence Various optimizations and heuristics are applied at each stage but a general outline of the algorithm is 1 First the reference sequence s is indexed to create a table making a record of all locations in the reference sequence that every po
185. nnotate restriction sites on a nucleotide sequence You can configure the following options Figure 10 1 e Candidate Enzymes lets you select a set of restriction enzymes from which you want to draw the ones to use in the analysis This will always include the option to use all known commercially available restriction enzymes but if your search index is intact then all restriction enzyme set documents from your local database will also be listed see below for how to create such a document e Minimum effective recognition sequence length lets you filter the candidate enzymes to in clude only ones whose recognition sequence has a given minimum effective length For example EcoRI s recognition sequence is 6 nucleotides long GAATTC The effective length takes ambiguities into account so that e g the sequence YS only has an effective length of 1 it is a better measure for the expected number of hits in a random sequence of fixed 10 2 DIGEST INTO FRAGMENTS 163 length because YS matches CC CG TC and TG On a random sequence with uniform nucleotide distribution it would match approximately once every nucleotide as would a recognition sequence of length 1 hence the effective length of YS is 1 Only include enzymes that match X to Y times lets you filter the results once the restriction sites have been identified If checked this option will discard all restriction sites for en zymes whose recognition sequence matches less than X or
186. node labels This refers to labels on the internal nodes of the tree Show branch labels This refers to the branches of the tree Each of the three above options has fields that you can set to customise what the labels display e Display allows you to select what information the labels display Branch Labels have fixed settings but you can select what the Tip Labels display either Taxon Names Node Heights Sequence Names or a number of other options depending on the tree you are viewing If you are viewing a consensus tree you can also display consensus support as a percentage on node labels e You can use Font to change the size of the labels The tree viewer will shrink the font size of some labels if they cannot all fit in the available space Minimum Size specifies the minimum size that the tree viewer is allowed to shrink the label font to e Significant Digits sets how many digits to display if the value the node is displaying is numeric Show scale bar This displays a scale bar at the bottom of the tree view to indicate the length of the branches of the tree It has three options Scale range font size and line weight Setting the scale range to 0 0 allows the scale bar to choose its own length otherwise it will be the length that you specify 3 7 6 Node Interaction You may click on a node in the tree viewer to select the node and its clade Double click the node to collapse un collaps
187. ns to select from Geneious allows you to search with a range of criteria however these depend on the database being searched All the fields in the NCBI public databases can be searched in any combination Each database has a specific list of fields and it is important to familiarize yourself with these fields to make full use of the Advanced Search The fields available for a search can be found in the left most drop down box after enabling the advanced search options When searching in your local documents can be used to represent any single character and can be used to represent a series of 0 or more unknown characters For example searching for CO I matches COI and COXI Note When searching the Genome Gene or PopSet databases the documents returned are only summaries To download the whole genome select the summary s of the genome s you would like to download and the click the Download button inside the document view or just above it There are also Download items in the File menu and in the popup menu when document summary is right clicked Ctrl click on Mac OS X The size of these files is not displayed in the Documents Table Be aware that whole genomes can be very large and can take a long time to download You can cancel the download of document summaries by selecting Cancel Downloads from any of the locations mentioned above Advanced Search also provides you with a number of options for restrictin
188. nt e Codon Change indicates the change in codon Essentially this is the same as the Change field but extended to include the full codon s For example TTC TTA e Amino Acid Change indicates the change if any in the amino acid s by translating the codon change For example F L Protein Effect summarizes the change on the protein as either a substitution frame shift truncation stop codon introduced or extension stop codon lost 4 7 CONTIG ASSEMBLY 135 Finding regions of low high coverage In addition to the coverage graph which gives you a quick overview of coverage under then Annotate amp Predict toolbar is the Find Low High Coverage feature This feature annotates all regions of low high coverage which you can then navigate through using the little left and right arrows next to the coverage annotations in the controls on the right You can set the threshold low high coverage by either specifying an absolute number of sequences or a number of standard deviations from the mean coverage Viewing Contigs of Paired Reads In order to view a contig of paired reads you first need to have set up the paired data before assembling see 4 7 4 Once you have your paired read assembly the contig viewer adds an option to Link paired reads in the advanced section of the controls on the right This means that pairs of reads will be laid out in the same row with a horizontal line connecting them
189. nt or Ctrl click on Mac OS X Select the online contacts which you want to invite you can select a range by Shift clicking or add contacts to the selection by Ctrl clicking Click invite to create this new chat session Accepting or Declining an Invitation to Chat When one of your contacts invites you to chat a dialog will appear asking you to accept or decline the chat invitation Clicking Accept will open a chat window that will allow you to chat with the contact who invited you and with all other contacts that were invited If you decline that invitation and enter a reason optional this reason will be displayed to everyone in the chat Sending and Viewing Messages in the Chat The chat window displays your own and your contacts previous messages You can enter new messages in the field at the bottom These messages will only be sent and become visible to your contacts once you click Send or press the Enter key 160 CHAPTER 9 COLLABORATION To leave the chat simply close the Chat Window 9 5 3 Setting up and running your own Jabber server Setting up your own Jabber server is simple and means that your documents will never leave your local network This means that you will not have any problems with firewalls achieve much greater download speeds and it provides an extra security layer for the confidentiality of your documents in case it is not sufficient for you that the communication with our Jabber
190. nter unformatted or FASTA sequence v C Subsequence gt Nucleotide Query v Database nr GenBank RefSeq EMBL DDBJ and PDB W Y Add Remove Databases Program Megablast fast high similarity matches DNA W Figure 2 7 Sequence Search Options Once the search has completed the results can be moved to your local database at your conve nience If your query sequence was annotated then any annotations that cover the hit region will be transfered to the BLAST hit document You can also download the full database sequence that corresponds to a BLAST hit To retrieve the full sequence select a BLAST alignment and go to File Download Documents or click the Download Full Sequence s button located above the viewer tabs The full sequence will be available in the Sequence View tab once the download has completed In addition the annotations from the full sequence will be transfered over to the BLAST alignment see Figure 2 10 If you have a mirror of the NCBI BLAST databases you can set Geneious to use this by going to Tools Add Remove Databases Set Up Search Services This will bring up a dialog that allows you to change the setup for various search services in Geneious Choose NCBI using the service drop down box at the top of the dialog Enter the URL for the mirror and click OK to apply the new settings You can also edit the databases that show up in Geneious by clicking on Edit Data
191. nto protein Clicking on this choice brings up a list of genetic codes that can be used Choose the appropriate one and click OK This is available only for nucleotide sequences Allow Editing Add Edit Annotation Annotate s Predict and Save 3 2 14 Editing sequences and alignments To edit sequence s or an alignment click the Allow Editing toolbar button After selecting a residue or a region you can either type in the new contents or use any of the standard editing operation such as Copy Ctrl 3 C Cut Ctrl 38 X Paste Ctrl 36 V Paste Without Anno tations Shift Ctrl V Paste Reverse Complement and Undo Ctrl 38 Z All operations are under the main Edit menu 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 77 Selecting a region enables the Add Edit Annotation button as well which opens an anno tation entry dialog Enter an annotation name and select a existing type or type a new one Click on More Options to enter additional properties for that annotation Double click on an existing annotation to edit it or right click Ctrl click on Mac OS X to display a pop up menu to delete annotations You can also copy an annotation from one sequence to another from the pop up menu When editing an alignment it is possible to select a region which may span several sequences and drag it to the left or right Dragging will either move residues over existing gaps or open new gaps when necessary Dragging a selection consi
192. ocuments or grouped documents All neces sary options are easily accessible from the interface Figure 10 4 Gibson Assembly Vectors Batch Vectors Insert each fragment separately Assemble fragments end on end Drag to set order es Sequence B 3 To perform batch cloning create sequence lists for each set of alternates Min Overlap Length 18 bp A Min Overlap Tm asec Save in sub folder Gibson Assembly y Tm Calculation Formula Breslauer et Salt correction SantaLucia ds Concentration Settings Monovalent 50 mM Oligo Divalent 1 5 gt mM dNTPs Figure 10 4 Gibson Assembly options dialog with extended options showing e A dropdown menu provides easy access to chose a vector If none is selected the prod uct s will be linear otherwise circular The longest sequence will automatically be pres elected as vector e Insert sequences can be ordered via drag n drop Sequences that have been previously 170 CHAPTER 10 CLONING grouped into lists Batch Sequences A shown in brown ish will all be inserted at the specified position generating one product per sequence at that position e If no grouped documents are provided in the drag n drop field the user can chose to in sert the sequences either all at once into the vector as a sequential assembly Assemble fragments end on end or alternatively as alternating inserts with one inser
193. oft or hard In the case of soft trims the sequence will remain but is ignored by many tools such as the assembler This means soft trims can be adjusted as needed or deleted completely Soft trims can be confusing to users of other software because they can see the sequence in the assembly but the sequence isn t really contributing to the as sembly and won t be part of the consensus sequence Dragging the ends of the trim annotation will make the newly untrimmed sequence visible and part of the consensus Figure 15 14 450 460 469 477 480 490 Consensus ACGGTACCTGCAGAAGAAGCACCGGCCAA GCAG CCGCGG CTACGTGCCA Coverage E a Ce REV 1 Fragmen j ALYY A AACTACG TG Vo i veh a ie LAS Ce FWD 2 Fragmen aS D REV 3 Fragmen Figure 15 14 Click and drag the trims to adjust 15 7 2 Multiple reference sequences The assembler can handle multiple reference sequences but they need to be combined into a sequence list document before assembly Do this using Sequence Group Sequences into a List and then select this list as the reference sequence making sure Geneious will use all sequences Geneious will then try all reads against all references in a single operation 15 7 3 Paired reads Paired read support is available but before it can be used the read files need to be combined using the pairing operation For example if you have imported two FASTQ files one with forward read
194. olumns in the Document Table and include the PubMed ID PMID first and last authors URL if available and the name of the Journal When a document is selected the abstract of the article is displayed in the Document Viewer along with a link to the full text of the document if available and a link to Google Scholar both below the author s name s Note If the full text of the article is available for download in PDF format it can also be stored in Geneious by saving it to your hard drive and then importing it This will allow full text searches to be performed on the article As well as the abstract and links Geneious also shows the summary of the journal article in BibTex format in a separate tab of the Document Viewer This can be imported directly into a BIEX document when creating a bibliography Alternatively a set of articles in Geneious can be directly exported to an EndNote 8 0 compatible format This is usually done when creating a bibliography for Microsoft Word documents 4 2 Sequence data Basic techniques such as dotplots and pairwise alignments can be used to study the relation ships between two sequences However as the number of sequences increases methods for determining the evolutionary relationships between them become more complicated When analyzing more than two sequences there are some common steps to determine the ancestral relationships between them The following sections outline the basic tools for prelim
195. on down drag them over to the desired folder and release If you dragged documents from one local folder to another this action will move the documents so that a copy of the document is not left in the original location In external databases such as NCBI the documents will be copied leaving one in its original location Drag and copy While dragging a document over to your folder hold the Ctrl key Alt Option key on Mac OS X down This places a copy of the document in the target folder while leaving a copy in the original location This is useful if you want copies in different folders Folders themselves can also be dragged and dropped to move them but they cannot be copied The Edit menu Select the document and then open the Edit menu on the menu bar Click on Cut Ctrl X 36 X or Copy Ctrl C 36 C Select the destination folder and Paste Ctrl V 38 V the document into it 2 5 STORING DATA YOUR LOCAL DOCUMENTS 43 Sequence Annotations Alignment View Alignment Annotations Dotplot Dotplot Self RNA Fold G dy Extract GRC ES Translate Add Edit Annotation g Allow Editing Annotate amp Predict MM Save 90 a 1 500 1 543 HLA A difference HLA A difference f A U Ah y E HLA A difference HLA A difference Y HLA A difference Y HLA A difference HLA A difference HLA A difference HLA A difference HLA A difference HLA A difference Alt click on a sequence position or ann
196. onally if the primer sequences were not already annotated with a primer annotation they will be annotated during testing 4 6 4 Primer Characteristics Characteristics for Selection Primer Characteristics can also be determined on a selection in a larger sequence Select a re gion of 150bp or less in the Sequence View and choose Characteristics for Selection The primer characteristics will then be added as an annotation over the exact region that was se lected This will also work on multiple selected regions in the Sequence View Hold the Ctrl key while clicking and dragging to select multiple regions simultaneously Convert to Oligo Geneious can convert any number of sequences that are 150 base pairs or fewer in length into primers This operation will also determine the primer characteristics of the sequences such as melting point To do this select your sequences and choose the same Primers action as you do with design or test then choose Convert to Oligo from the popup menu that appears If you select just two sequences you have the additional option of determining their pair characteristics Determining the pair characteristics of two primer sequences can be used to see if two sequences can pair and how well they do so 4 6 PCR PRIMERS 119 4 6 5 Primer Extensions You can add a primer extension to an existing oligonucleotide Y sequence by selecting Primers Add 5 Extension You can add your
197. opy of a duplicated document This means they can be deleted or easily moved to another folder leaving one copy behind If you are searching for duplicates within sequences of a single alignment or sequence list you also have the option to extract unique sequences from the list 2 5 7 Batch Rename Batch rename is located under the Edit menu and is used to edit the names of many docu ments in one step It has options to replace the names with a combination of values from other columns e g organism or accession It can also add fixed text to the beginning or end of each name 60 00 Preferences Plugins and Features Appearance and Behavior Keyboard NCBI Sequencing Data Storage Location Users alexei Geneious Browse Search History Clear mM Check for new versions of Geneious 1 Also check for beta versions of Geneious Check for updates now Figure 2 12 Setting the location of your local documents 2 6 Agents Geneious offers a simple way for you to continuously receive the latest information on genomes sequences and protein structures This feature is called an agent Each agent is a user defined 48 CHAPTER 2 RETRIEVING AND STORING DATA automated search You can instruct an agent to search any Geneious accessible database at reg ular intervals e g weekly including your contacts on Collaboration This simple but powerful feature ensures that you never miss that critical article or DNA seque
198. otation or select a region to zoom in Alt shift click to zoom out a Complete Database Sequence Sequence View Sequence Annotations Alignment Annotations Dotplot Dotplot Self RNA Fold gt Qa Extract GERC 6 Translate Add Edit Annotation g Allow Editing Annotate Predict MM Save My i q 1 100 200 300 400 500 600 700 800 900 1 000 1 100 1 200 1 270 Consensus ANA NR Identity hh SS ihn Sh SiS SS a a ex e Siga MHC class alpha chain peptide UTA t Ce 1 pygmy c Wit ttt J A DRS p es or eo ee a A E Mint A pt Le S E eM 1 AAA AA TEN TI 2 BC019236 wnt Wait Witt EE a Mouse over base 1 137 T in Consensus b Annotations Transfered to Alignment Figure 2 10 Document After Full Sequence Download 44 CHAPTER 2 RETRIEVING AND STORING DATA Table 2 4 Geneious document types Document type Geneious Icon Nucleotide sequence Oligo sequences Enzyme Sets Chromatogram Contig Protein sequence Pfam domain sequence Phylogenetic tree 3D structure VAWYBBWesS ERY DR Sequence alignment Journal articles Ss PDF Other documents 2 5 2 Deleting Data and Deleted Items When a folder or document is deleted Geneious moves the data to the Deleted Items folder instead of erasing it immediately This means the data can be recovered if it was deleted by mistake Pressing the Delete key is the easiest way to move the selected folder or do
199. ov pub COG COG The files you need are e myva e myva gb e whog Save these files to a local folder Now go to Tools Add Remove Databases Add Se quence Database and select Custom BLAST using the Service drop down box Choose to 143 144 CHAPTER 6 COGS BLAST 7 COGs Setup Geneious is downloading the required files You may continue to use Geneious whog ELLE EEL EEE EEE EEL L EL ELL Finished setting up myva gb LLL EE EEE LEE L EL ELL Finished setting up myva Downloaded 1 292 of 61 320 MB 2 10 Approximately 3 minutes 25 seconds remaining Figure 6 1 The Cogs BLAST Download Manager Create from file on disk and click Browse Navigate to the file myva and click OK make sure that the protein database option is checked Now copy the other two files that you down loaded into the data folder inside your Custom BLAST folder 6 1 2 Downloading the COGs BLAST databases through Geneious Geneious provides a download manager to help you download and set up the COGs BLAST database To use it go to Tools Add Remove Databases Set Up Search Services and select COGS BLAST from the Service drop down box Make sure Let Geneious do the setup is checked Then click OK After a few seconds the compressed file containing all the files needed to run COGS BLAST will start downloading You can click Pause to pause the download Once all
200. own notes and insert references into documents It also generates a bibliography in different styles Geneious can interoperate with EndNote using Endnote s XML Extensible Markup Language file format to export and import its files FASTA format The FASTA file format is commonly used by many programs and tools including BLAST 1 T Coffee 17 and ClustalX 23 Each sequence in a FASTA file has a header line beginning with a gt followed by a number of lines containing the raw protein or DNA sequence data The sequence data may span multiple lines and these sequence may contain gap characters An empty line may or may not separate consecutive sequences Here is an example of three sequences in FASTA format DNA Protein Aligned DNA gt Orangutan ATGGCTTGTGGTCTGGTCGCCAGCAACCTGAAT CTCAAACCTGGAGAGTGCCTTCGAGTG gt gi 532319 pir TVFV2E TVEV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWOK gt Chicken CTACCCCCCTAAAACACTTTGAAGCCTGATCCTCACTA CTGT CATCTTAA FASTO format FASTO format stores sequences and Phred qualities in a single file GenBank files Records retrieved from the NCBI webiste http www ncbi nlm nih gov can be saved in a number of formats Records saved in GenBank or INSDSeq XML formats can be imported into Geneious 2 2 IMPORTING AND EXPORTING DATA 27 Geneious format The Geneious format can be used to store all your local documents meta data types
201. ows Geneious users to share the products of their research and work with each other Based on an open Internet protocol called XMPP or Jabber it allows you to maintain a list of contacts so that you see who is online when you sign on yourself You can then share documents with your online contacts and browse and work with their documents in return The list of contacts is stored on the server so you can easily access an account including its contacts both at work and on your private computer Collaboration can work with any existing Jabber service such as Google Talk but we recom mend using the Geneious default talk geneious com You can even access several Jabber accounts at the same time which is particularly convenient if you wish to set up and run your own Jabber server section 9 5 3 This chapter shows you how to e Create a new collaboration account e Search for and add contacts to your account Share local folders with your contacts e Search your contacts as you would an online database e Set up and run your own Jabber server 153 154 CHAPTER 9 COLLABORATION 9 1 Managing Your Accounts When you start Geneious you will see the empty Collaboration service in the Services Panel and the Collaboration submenu under Tools You can open the Add New Account dialog by either right clicking Ctrl click on Mac OS X on Collaboration in the Services Panel and clicking Add New Account in the popup men
202. pes allows selecting a subset of types for display in the table The Select One button in the menu is a quick way to view just one type while also selecting the relevant columns for that type Relevant columns are deemed to be ones where at least one annotation of that type has a value for the column e Columns allows control over which columns are visible in the table e Export table exports the visible rows and columns to a CSV comma separated values file e Extract extracts the region of the selected annotation into a new document e Translate translates the nucleotides in the region of the selected annotation into amino acids allowing selection of the appropriate translation table and frame e Filter text in this field is used to filter the table Filtering is only done against the currently visible columns for each annotation 3 4 Dotplot viewer This is a special viewer that appears when one or two sequences are chosen A dotplot com pares two sequences to find regions of similarity Each axis X and Y on the plot represents one of the sequences being compared Figure 3 8 For more information on dotplots see section 4 3 3 4 DOTPLOT VIEWER DSS Virtual Gel Lengths Graph Notes pygmy chimpanzee mutated 1 000 Colors M Reverse complement _ Pairwise alignment path Y DataSource High Sensitivity Slow Score Matrix Window Size 20 el Threshold 50 2 Bas
203. pology threshold will be output as summary trees The summary trees have branch lengths that are the average of the lengths of the same branch from trees with the same topology The topology threshold determines what percentage of the original tree topologies must be represented by the summarizing topologies The most common topology will always be output as the first summary tree If the frequency of this does not meet the threshold then the next most frequent topology will be added and so on until the total frequency of the topologies reaches the threshold value A topology threshold of 0 will result in only the most common topology being output a thresh old of 100 will result in all topologies being output 4 5 8 Tree building in Geneious Geneious can build a phylogenetic tree for a set of sequences using pairwise genetic distances To build a tree select an alignment or a set of related sequences all DNA or all protein in the Document table and click the Tree icon or choose this option from the Tools menu 108 CHAPTER 4 ANALYSING DATA Geneious Tree Builder amp Consensus Tree Builder 2 PAUP Genetic Distance Model Tamura Nei Tree build Method Neighbor Joining outgroup No Outeroup 13 Pairwise distances will be obtained from the multiple sequence alignment This may reduce accuracy slightly but will produce results faster Consensus Tree Options C Resample tree Resampling Method
204. port from the Manage Profiles window allows you to save a file containing a particular profile These can be emailed to other Geneious users and imported for use with their data The easiest way to import a profile is by dragging the file directly in to Geneious If a profile is marked as Shared when it was created or by editing it then the profile will be copied across to any Shared Database that you connect to This means anyone else who connects to the same Shared Database will automatically have the profile under their Load Profile menu Note Once a profile is shared it cannot be un shared but it can be deleted Also other users can edit or delete a shared profile at any time 4 9 Results of analysis All analysis results are deposited in the currently selected folder If no local folder is selected then you will be prompted for a local folder This applies to sequence alignments phylo genetic trees sequence translations reverse complements and extraction of sequences Once generated analysis results can be dragged to another location if desired 138 CHAPTER 4 ANALYSING DATA Chapter 5 Custom BLAST Custom BLAST allows you to create your own custom database from either FASTA files or sequences in your local folders and BLAST against it 5 1 Setting Up The Custom BLAST plugin requires access to NCBI BLAST not BLAST binary files 5 1 1 Setting up the Custom BLAST files yourself If you want you can do
205. possible hits when dealing with sequences of 20bp This is why Biomatters has not implemented Primer BLAST Users may want to use BLAST to test if primers match against their sequences because Test with Saved Primers requires 5 extensions to be annotated so the test will ignore them but this is also a bad idea since any matches it does produce will be local alignments rather than full length matches potentially truncating both ends not just where the 5 extension is It is possible to repurpose the assembler to do this though so see the chapter on primers If the primer has a 5 extension this should be annotated onto the sequence correctly and then 202 CHAPTER 15 TROUBLESHOOTING Geneious will ignore that region when primer testing If this isn t done the primer will not match This would explain why some users have insisted that BLAST is the right tool for this job 15 6 Primers Primer design in Geneious is based on Primer3 but the tool has been used in creative ways to perform many operations that it wasn t really designed for 15 6 1 Primer testing performance is slow When there are a lot of primers the testing process can take a long time especially when degenerate primers are being used or if you re testing primers as pairs Testing as pairs can be especially slow with a lot of primers because Geneious has to test every possible combination and this can turn a task that should take seconds into one that will take
206. pped reference sequence that is covered by at least 1 read is also displayed 76 CHAPTER 3 DOCUMENT VIEWERS Rough Tm A rough calculation of the melting point for a nucleotide sequence using the follow ing calculations If the sequence is less than 14bp in length RoughTm 4 x GC count 2 x ATcount If the sequence is greater than 13bp in length RoughTm 64 9 41 x GCcount 16 4 length 3 2 13 The sequence viewer toolbar The top of the sequence viewer panel shows a toolbar containing several actions Some of them operate on a part of a sequence or alignment There are several ways to make such a selection e Mouse dragging Click and hold down the left mouse button at the start position and drag to the end position By using the Ctrl Windows Linux or 3 Mac keys it is possible to select multiple regions of a sequence or alignment e Select from annotations When annotations are available click on any annotation to select the annotated residues As with mouse dragging multiple selections are supported e Click on sequence name This will select the whole sequence e Select all Use the keyboard shortcut Ctrl A A on Mac to select everything in the panel The available actions are Extract Extract the selected part of a sequence or alignment into a new document Reverse Complement Reverse sequence direction and replace each base by its complement This is available only for nucleotide sequences Translate Translate DNA i
207. ppen e Another pair of sequences is aligned e A sequence is aligned with one of the intermediate alignments e A pair of intermediate alignments is aligned This process is repeated until a single alignment containing all of the sequences remains Feng amp Doolittle were the first to describe progressive pairwise alignment 5 Their algorithm used a guide tree to choose which pair of sequences alignments to align at each step Many variations of the progressive pairwise alignment algorithm exist including the one used in the popular alignment software ClustalX 23 Multiple sequence alignment in Geneious Multiple sequence alignment in Geneious is done using progressive pairwise alignment The neighbor joining method of tree building is used to create the guide tree As progressive pairwise alignment proceeds via a series of pairwise alignments this function in Geneious has all the standard pairwise alignment options In addition Geneious also has the option of refining the multiple sequence alignment once it is done Refining an alignment in volves removing sequences from the alignment one at a time and then realigning the removed sequence to a profile of the remaining sequences The number of times each sequence is re aligned is determined by the refinement iterations option in the multiple alignment window The resulting alignment is placed in the folder containing the sequences aligned A profile is a matrix of num
208. pport threshold A 100 support threshold results in a Strict consensus tree which is a tree where the included clades are those that are present in all the trees of the original set A 50 threshold results in a Majority rule consensus tree that includes only those clades that are present in the majority of the trees in the original set A threshold less that 50 gives rise to a Greedy consensus tree In constructing a Greedy consensus tree clades are first ordered according to the number of times they appear i e the amount of support they have then the consensus tree is constructed progressively to include all those clades whose support is above the threshold and that are compatible with the tree constructed so far The length of the consensus tree branches is computed from the average over all trees contain ing the clade The lengths of tip branches are computed by averaging over all trees Note The above definitions apply to rooted trees The same principles can be applied to un rooted trees by replacing clades with splits Each branch edge in an unrooted tree corre sponds to a different split of the taxa that label the leaves of this tree 4 5 7 Sort topologies This will produce one or more trees summarizing the results of resampling The frequency of each topology in the set of original trees is calculated and the topologies are sorted by their frequency A number of these topologies based on the to
209. ption key on Mac OS X and clicking When the zoom key is pressed a magnifying glass mouse cursor will be displayed e Hold the zoom key and left click on the sequence to zoom in e Hold the zoom key and Shift key to zoom out e Hold the zoom key and turn the scroll wheel on your mouse if you have one to zoom in and out e Hold the zoom key and click on an annotation to zoom to that annotation 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 63 You can also pan in the Sequence View by holding Ctrl Alt Alt on Mac OS X and clicking on the sequence and dragging 3 2 2 Circular View When a circular sequence is selected the default view is to display the sequence as circular The view can be rotated by using the scrollbar at the bottom or by turning the mouse wheel Even though a sequence is circular you can display it as a linear sequence using the Linear view on circular sequence checkbox under the Layout section of Advanced 3 2 3 Genome View The genome view Figure 3 2 is the default view of the sequence viewer when a sequence list containing very large sequences is selected It is also launched when multiple large sequences are selected in the document table Sequence View Annotations Virtual Gel Lengths Graph Text View History Notes a gt Extract irc SS Translate Allow Editing Add Edit Annotation Annotate Predict gt gt amp i 1 100 000 24 f ATMG00920 1 10 2
210. quality scores to be used Raw sequence format A file containing only a sequence Rich Sequence format RSF Rich Sequence Format files contain one or more sequences that may or may not be related In addition to the sequence data each sequence can be annotated with descriptive sequence information Comma Tab Separated Values Sequences such as primer lists are often stored in spreadsheets Geneious has an importer that can be given the field values for a spreadsheet file exported in CSV or TSV format and it will import them and convert them to documents as well as preserving the additional field contents It can handle nucleotide and amino acid sequences as well as primers and probes For more information on importing primers from a spreadsheet see the PCR Primers section 30 CHAPTER 2 RETRIEVING AND STORING DATA SAM and BAM format SAM and BAM format are produced and used by SAMtools SAM BAM files contain the results of an assembly in the form of reads and their mappings to reference sequences Sequence Chromatograms Sequence chromatogram documents contain the results of a sequencing run the trace and a guess at the sequence data base calling Informally the trace is a graph showing the concentration of each nucleotide against sequence positions Base calling software detects peaks in the four traces and assigns the most probable base at more or less even intervals VCE format The VCF format contains sequence anno
211. r option 2 Create a new folder and name it according to the contents of the search For example type CytB if searching for cytochrome b complex 3 Once created select the new folder You can now select the Create or Create and Run The agent will then be added to the list in the agent dialog and it will perform its first search if you clicked Create and Run Otherwise it will wait until its next scheduled search 2 6 2 Checking agents Once you have created one or more agents Geneious allows you to quickly view their status in the agents window which is accessible from the toolbar Your agents details are presented in several columns Enable Action Status and Deliver To 2 6 AGENTS Search Database F PubMed Match of the following contains Drummond Deliver To Local 0 search every _ Make destination folder a smart folder _ Only get documents created after today cancel Greate Gresteand Ron Figure 2 13 The Create Agent Dialog 49 50 CHAPTER 2 RETRIEVING AND STORING DATA Enable This column contains a check box showing whether the agent is enabled Action This summarizes the user defined search criteria It contains 1 Details of the database accessed For example Nucleotide and Genome under NCBI 2 The search type the Agent performed e g keyword 3 The words the user entered in the search field for the Agent to match against
212. r Gateway site or a combina tion of these For more information see section 4 6 5 A mispriming library is a set of sequences usually repeats which the primers should not bind to Four inbuilt libraries are available for selection or you can upload a custom library of sequences in fasta format For more information on the inbuilt libraries see the Primer3 help page Output from Primer Design Once the task and options have been set click the OK button to design the primers A progress bar may appear for a short time while the process completes When complete each of the sequences will have the designed primers and probes added to them as sequence annotations If you are designing primers off an alignment the primer will be annotated on the consensus sequence The annotations will be labelled with the base number the primer starts at followed 4 6 PCR PRIMERS 115 by either F forward primer R reverse primer or P probe Primers will be coloured green and probes red Detailed information such as melting point tendency to form primer dimers and GC content can be seen by hovering the mouse over the primer annotation The information will be pre sented in a popup box Alternatively double clicking on an annotation will display its details in the annotation editing dialog Table 4 1 shows how the values in the Geneious primer anno tation map to the original Primer3 values The best way to save a primer or DNA probe for furt
213. r gives quick access to commonly used features in Geneious including Sequence Search eg BLAST Agents that search databases for new content even while you sleep Align Assemble Tree building and Help For more information on the toolbar see section 2 1 5 A A p oa a lt a e Q a A Cc O O Customize Toolbar Back Forward Sequence Search Agents Align Assemble Tree Primers Cloning Back Up A Y Back Sources Shi w Local 0 Name Description History Len w Sample Documents 0 DCN_F primer Foward primer for mM gt Forward 2 3D Structures 5 Alignments 7 6 Contig Assembly 6 DCN_R primer Reverse primer for DCN gene Homo sapiens dec house mouse gene Mus musculus 1 Cys 3 Connect Disconnect Jabber a New Sequence y e a Genomes 4 g house mouse gene 2 Mus musculus 1 Cys la Import I 2 Linnaeus Blast 1 Insert Sequence An insert sequence Nucleotide Documents Plasmid Vector Shuttle vector pMQ91 Export Plasmids from NEB 27 Possum PopSet Genetic structure of tl Mm Divider 45 Protein Documents 5 g pygmy chimpanzee Pan paniscus clones R f Restriction Enzymes 2 WNT1 gene WNT1 wingless type Search Tree Documents 4 we Sequence Search B Deleted items 0 M Eh agents P Searches 0 Server Databases EB Operations Collaboration v NCBI Gene B Genome 2 Nucleotide gt DanSar Do Not Disturb Mode om Figure 1
214. read errors This feature can also be configured to only find disagreements in coding regions if the reference sequence has CDS annotations present and can analyze the effects of variations on the protein translation to allow you to quickly identify silent or non silent mutations It can also calculate p values for variations and filter only for variations with a specified maximum P Value The p value represents the probability of a sequencing error resulting in observing bases with at least the given sum of qualities The lower the p value the more likely the variation at the given position represents an allele When calculating P Values e The contig is assumed to have been fine tuned around indels e Ambiguity characters are ignored other characters in the column are still used e Homopolymer region qualities are reduced to be symmetrical across the homopolymer For example if a series of 6 Gs have quality values 37 31 23 15 7 2 then these are treated as though they are 2 7 15 15 7 2 This is done because variations may be called at either end of the homopolymer and because reads may be from different strands e Gaps are assumed to have a quality equal to the minimum quality on either side of them after adjusting for homopolymers e When finding variations relative to a reference sequence the p value calculated is for the variant not the change In other words the p values calculated are independent of the reference sequence dat
215. reate Consensus Tree Choose this to create a consensus tree from the samples Sort Topologies Produce trees which summarise the topologies resulting from resampling See above for more details Support threshold This is used to decide which monophyletic clades to include in the con sensus tree after comparing all the trees in the original set see Consensus Tree section above Topology Threshold The percentage of topologies in the original trees which must be rep resented by the summarizing topologies Save raw trees If this is turned on then all of the trees created during resampling will be save in the resulting tree document The number of raw trees saved will therefore be equal to the number of samples Creating a consensus tree of existing trees If you select a tree set document and choose Tree then the Consensus option will be available at the to of the tree builder options This will create a consensus tree using the trees already 110 CHAPTER 4 ANALYSING DATA in the document no resampling will be performed and it will either be added to the tree document or saved as a separate tree document The only option available here is the consensus support threshold 4 6 PCR Primers Geneious provides several operations for designing and working with PCR Primers and DNA or hybridisation probes PCR Primers and DNA or hybridisation probes can be designed for or tested on existing nucleotide sequences or alignments A PCR
216. results when assembling If you don t know 4 7 CONTIG ASSEMBLY 131 what the relative orientation or expected distance is between the reads you should ask your sequencing data provider When you click OK of you chose to pair by parallel lists of sequences Geneious will create a new document containing the paired reads If you chose to pair an interlaced list of sequences or modify settings for some already paired data Geneious will just modify the existing list of sequences to mark it as paired If you choose to split reads based on the presence of a linker sequence e g for 454 data the original sequences will be unmodified and the split reads will be created in a new document The default behaviour is to ignore sequences shorter than 4bp either side of the linker but this can be customized from the Edit Linkers option in the paired reads options Polonator sequencing machine reads can be split using the Split each read in half option 4 7 5 Splitting multiplex barcode data Multiplex or barcode data e g 454 MID data can be separated using Separate Reads by Barcode from the Sequence menu The barcode options allow for mismatches substitutions deletions or insertions and trimming of primer fragments adapter and linker sequences is also supported All sequences matching a barcode are copied to an correspondingly named sequence list document Default settings are supplied for 454 MID data splitting so
217. riction sites used to cut the vector in advance the Insert into Vector operation will do that for you This operation cannot deal with some aspects of molecular cloning such as triple ligation and the blunting or filling in of overhangs If you want to do a cloning operation outside the scope 166 CHAPTER 10 CLONING of this operation you will need to annotate restriction sites on the sequences involved digest the fragments modify them in the sequence viewer if necessary and then ligate them back together as a set of discrete steps 10 3 1 Insert Options You cannot alter the insert used in the operation from the options but you can select what direction to insert in forward or reverse If the insert fragment has complementary overhangs or is blunt at both ends you can also choose to insert in both directions In this case two product documents will be created one for the insert in each direction The insert options also present a diagram showing the bases at each end of the insert fragment 10 3 2 Vector Options e Polylinker region to cut within These options let you choose what region within the vector sequence to look for enzymes to cut within Geneious will examine the vector sequence for enzymes that have cut sites within this region and none outside it You can specify the polylinker in the following ways Annotation If the vector has one or more polylinker annotations annotated on it you can choose to use the interva
218. ries Coiled coils prediction Predicts coiled coils in amino acid sequences Contig Sorter Sorts sequences in a contig according to the position of CpG Islands Identifies likely CpG islands in a DNA sequence DeCypher Plugin Provides the ability to run various DeCypher server DualBrothers Recombination Detection Detect recombination and EMEOSS Tonia Denis n le f ha EMBOSS mack Install plugin from a gplugin file D Check for plugin updates now M Automatically check for updates to installed plugins M Tell me when new plugins are released C Also check for beta releases of plugins Features The set of features available in Geneious can be customized to suit your work Customize Feature Set Figure 2 18 The plugins preferences in Geneious 58 CHAPTER 2 RETRIEVING AND STORING DATA 2 9 3 Appearance and Behavior Here you can change the way Geneious looks and the way it interacts with you Appearance options allow you to change the way the main toolbar and the document table look Behaviour options allow you to change the way newly created documents are handled Such as whether they are selected straight away and where they should be saved to 2 9 4 Keyboard This section contains a list of Geneious functions and allows you to define keyboard shortcuts to them Shortcuts that are already defined are highlighted in blue Setting shortcuts can help you quickly na
219. ro Nucleotide amp protein sequences DNAStar DNA Strider str Sequences DNA Strider Mac program ApE Embl UniProt embl swp Sequences Embl UniProt Endnote 8 0 XML xml Journal article references Endnote Journal article websites FASTA fasta fas etc Sequences alignments PAUP ClustalX BLAST FASTA FASTQ fastq fasq Sequences with quality Solexa Illumina GCG seq Sequences GCG GenBank gb xml Nucleotide amp protein sequences GenBank Geneious xml geneious Preferences databases Geneious Geneious Education tutorial zip Tutorial assignment etc Geneious GFF gff Annotations Sanger Artemis MEGA meg Alignments MEGA Molecular structure pdb mol xyz cml gpr hin nwo 3D molecular structures 3D structure databases and programs Newick tre tree etc Phylogenetic trees PHYLIP Tree Puzzle PAUP ClustalX Nexus nxs nex Trees Alignments PAUP Mesquite MrBayes amp MacClade PDB pdb 3D Protein structures SP3 SP2 SPARKS Protein Data Bank PDF pdf Documents presentations Adobe Writer IATEX Miktex Phrap ACE ace Contig assemblies Phrap Consed PileUp msf Alignments pileup gcg PIR NBRF pir Sequences alignments NBRF PIR Qual qual Quality file Associated with a FASTA file Raw sequence text seq Sequences Any file that contains only a sequence Rich Sequence Format rsf Sequences alignments GCGs NetFetch Comma Tab Separated Values csv tsv Spreadsheet files Microsoft Excel SAM B
220. rs option is added to the Sequence menu This allows you to select an individual folder or set of documents and set the binning parameters to use on those documents instead of the global ones set in the Preferences 2 10 Printing and Saving Images Geneious allows you to print or save as an image the current display for any document viewer This includes the sequence viewer tree view dotplot and text view 2 10 1 Printing Choose print from the file menu The following options are available Portrait or landscape Controls the orientation of the page Scale Can be used to decrease or increase the size of everything in the view while still printing within the same region of the page For many types of document views this will cause it to wrap to the following line earlier usually requiring more pages Size Controls the size the printed region on the paper Effectively increasing the size reduces the margins on the page 2 10 2 Saving Images Choose save to image file from the file menu The following options are available Size Controls the size of the image to be saved Depending on the document view being saved these may be fixed or configurable For example with the sequence viewer if wrapping is on you are able to choose the width at which the sequence is wrapped but if wrapping is off both the width and height will be fixed Format Controls image format Vector formats PDF SVG and EMF are ideal for pub
221. ry Downgrading requires that the new version of Geneious is uninstalled first to avoid there being vestiges of the old copy in place Once this is done the old version can be reinstalled and Geneious will start up and see the old data folder but won t be able to access data created in the new version If you have done work you need to get into the old version you will need to export your data using an open format such as GenBank rather than just saving the geneious format file prior to downgrading since Geneious files are not backwards compatible Bibliography 1 SF Altschul W Gish W Miller EW Myers and DJ Lipman Basic local alignment search tool J Mol Biol 215 1990 no 3 403 410 26 37 2 MO Dayhoff ed Atlas of protein sequence and structure vol 5 National biomedical re search foundation Washington DC 1978 98 3 R Durbin S Eddy A Krogh and G Mitchison Biological sequence analysis Cambridge University Press 1998 100 4 J Felsenstein Confidence limits on phylogenies An approach using the bootstrap Evolution 39 1985 no 4 783 791 106 5 DF Feng and RF Doolittle Progressive sequence alignment as a prerequisite to correct phyloge netic trees J Mol Evol 25 1987 no 4 351 60 101 6 O Gotoh An improved algorithm for matching biological sequences J Mol Biol 162 1982 705 708 98 7 S Guindon and O Gascuel A simple fast and accurate algorithm to estimate large phylo
222. s and one with reverse reads you should select both and then use Sequence Set Paired Reads and choose the appropriate settings such as expected distance between 15 8 INSTALLATION AND LICENSING 207 pairs This will generate a new paired file which can be selected in the assembly operation and the extra information will be used to help the assembler resolve complex placement issues 15 8 Installation and Licensing 15 8 1 Upgrading broke Geneious If an upgrade has resulted in a broken install uninstall Geneious and delete the Geneious installation folder not the Geneious 7 0 Data folder though and reinstall This should fix the problem There have also been some issues with the user preferences xml found in your Geneious 7 0 Data folder which can be solved by renaming it so Geneious creates a new one If this wasn t the problem you can rename it back without having lost all your preferences unnecessarily On a Mac when you upgrade your memory gets reset to the default this is due to how up grades are handled on the Mac Sometimes you have very large files in your local database which the default memory won t handle To fix this find the Info plist file which is in the Geneious app right click to Show Package Contents and browse into the Contents and you ll find the file Edit this and look for the VMOptions key Edit the Xmx value increasing the memory allocated to your previous value which worked and Geneious should now s
223. s available under the menu item Tools Restriction Analysis and in the context menu right click on the sequence or Ctrl click on Mac OS X e Find Restriction Sites allows you to specify an arbitrary candidate set of restriction en zymes and the desired number of matches so that you can e g identify enzymes that cut only once or twice as well as a region enzymes may not cut within After running the analysis the position of the matching enzymes recognition sequence and the sites where they cut will be visible on the sequence as annotations and you will be able to see a table of all fragment start and end positions and their lengths and of all restriction enzymes involved These tables can be exported as csv files for subsequent processing with other software such as e g Microsoft Excel Like many restriction enzymes EcoRI is methylation dependent and cuts only if the second A in the recognition sequence is not methylated to N6 methyladenosine The restriction enzyme information included in Geneious was obtained from Rebase 18 available for free at http rebase neb com 161 162 CHAPTER 10 CLONING Digest into fragments allows you to generate the actual fragments that would be created in a digestion experiment using restriction enzymes When running a digestion experi ment you can choose to either use the restriction sites already annotated to the sequences or a subset that corresponds to only some specifi
224. s derived from the Patent division of GenBank PDB Sequences derived from 3D structure Brookhaven PDB RefSeq RefSeq protein sequences from NCBI s Reference Sequence Project SwissProt Curated protein sequences information from EMBL of the Sequence Search options The available options vary depending on the kind of BLAST search you have selected For details on each of the options you can hover your mouse over the option to see a short description or refer to the NCBI BLAST documentation at http www ncbi nlm nih gov blast blastcgihelp shtml Once a search has started a results folder will be created under the Searches folder in the Sources panel Search progress is shown in the document table The search can be cancelled by clicking on the red square labelled Stop See Figure 2 8 When using the Standard search type each search hit is displayed separately in the docu ment table sorted by E value As well as standard columns like Name search hits can also be sorted by Coverage and a special Grade column which is calculated by Geneious The Grade column is a percentage calculated by combining the query coverage e value and identity values for each hit with weights 0 5 0 25 and 0 25 respectively This allows you to sort hits such that the longest highest identity hits are at the top 40 CHAPTER 2 RETRIEVING AND STORING DATA 9 Sequence Search O pygmy chimpanzee nucleotide Query 2 E
225. s were designed with the supported databases in mind and packaged with database drivers for them However Geneious allows you to supply your own jdbc database driver if you want to You may want to do this because you have an updated driver or because you have a driver for an unsupported database It is not guarnteed that Shared Databases will work with another database system if you provide its driver but it is likely that it will To supply your own driver open up the dialog you would normally use to connect to a database Then click the More Options button 11 3 REMOVING A SHARED DATABASE 175 11 3 Removing a Shared Database 7 To remove a Shared Database simply right click on its top folder and choose Remove database 11 4 Administration The typical user will not have to do any administration this section is for those in charge of the database 11 4 1 Groups and Roles Shared Databases support user groups and roles for managing access to documents This means that you can restrict access of folders to privileged people How it works is that each folder in Geneious belongs to a group Users can belong to any number of groups and have a specified role within that group The three roles are e View allows the user to view the contents of folders e Edit allows the user to view and edit the contents of folders e Admin allows the user special administrative functions on folders As of this time Ge
226. se the same Primers action from the menu and go to Test with Saved Primers in the popup menu that appears en Test with Saved Primers te Primer3 if y a Test specific primers against the selected sequences m Fonai Fimer DCN_F primer Choose Reverse Primer DCN_R primer Probe None Choose Search for saved primers that match the selected sequences Y Search for Forward A Y Search for Reverse Region Input Options ve Included Region 7 1 536 r Target Region 1 3 _ Product Size Between Optimal Product Size _ Maximum Mismatches 1 _ Mismatch Options _ gt Tm Calculation 3 Cancel ok Figure 4 8 The primer test dialog There are two ways in which Geneious can test your selection of primers and probes The first option in the dialog tests a specific forward and reverse primer pair and or probe Clicking the Choose buttons next to forward reverse and probe options will bring up your primer database allowing you to select any primer in your database for testing The second option allows you to specify multiple primers and probes to test all selected se quences against Click the Choose button then hold down CTRL click 8 click on Mac to select multiple primers and probes from many different locations in your database Alterna tively you can select one or more folders to test with all the primers and probes inside them 118 CHAPTER 4 ANALYSING DATA or click the Use All button to use every primer in your d
227. section 3 9 2 3 9 Parents and Descendants Many documents in Geneious are the output of an operation run on a set of input documents The input documents of the operation are known as the parents of the output and the output documents the descendants or children of the input Those parent documents may them selves be the descendants of other documents each with their own parents and so on In many situations it is useful to preserve this hierarchy so that future alterations for example the re calling of a base or the addition of a new annotation can be transferred downstream to the molecules affected by this change in a parent 3 9 PARENTS AND DESCENDANTS 87 An active link between a child and its parents means that when you modify any of the parent documents you will be given the choice of propagating these changes to the child When this modification affects a part of the parent involved in creating the child the change will be immediately visible in the child Modifications include things like editing the residues of a sequence adding new annotations or changing the meta data associated with the document Propagating a change to a parent document causes Geneious to rerun every operation that links that parent actively to one or more child documents with the altered parent document and any other parents as input Geneious stores the options that the operation was originally run with so that it can reproduce the original operatio
228. select above e Root Length Sets the length of the visible root of the tree Rooted and Circular views e Curvature Adds curvature to the tree branches Rooted view only e Align Taxon Labels Aligns the tip labels to make viewing a large tree easier Rooted view only e Root Angle Rotates the tree in the viewer Circular and Unrooted views e Angle Range Compresses the branches into an arc Circular view only 3 7 5 Formatting There are a range of formatting options Transform branches allows the branches to be equal like a cladogram or proportional Leaving it unselected leaves the tree in its original form Ordering orders branches in increasing or decreasing order of length but within each clade or cluster Show root branch displays the position of the root of the tree has no effect in the unrooted layout Line weight can be increased or decreased to change the thickness of the lines representing the branches Auto subtree contract automatically contracts subtrees when there is not enough space on screen to display them nicely Show selected subtree only shows only the part of the tree that is selected or the entire tree if there is no selection If you are unfamiliar with tree structures please refer to Figure 3 12 for the following options Show tip labels This refers to labels on the tips of the branches of the tree 3 7 TREE VIEWER 85 Root Branch Node Tip Figure 3 12 Phylogenetic tree terms Show
229. sition Double clicking the minimap will zoom further in on the clicked section Finally highlighting a section of the minimap using a click drag release action will display the highlighted region in the se quence viewer 3 2 4 Colors The colors option controls the coloring of the sequence nucleotides or amino acids Color ing schemes differ depending on the type of sequence For example the Polarity and Hy drophobicity coloring schemes are available only for Protein sequences Similarity Color Scheme The similarity scheme is used for quickly identifying regions of high similarity in an alignment In order for a column to be rendered black 100 similar all pairs of sites in the column must have a score according to the specified score matrix equal to or exceeding the specified thresh old So for example if you have a column consisting of only K Lysine and R Arginine and are using the Blosum62 score matrix with a threshold of 1 then this column will be colored entirely black because the Blosum62 score matrix has a value of 2 for K vs R If you raised the threshold to 3 then this column would no longer be considered 100 similar If the column consisted of 9 K s and 1 R then continuing with the threshold value of 3 the 9 K s which make up 90 of the column would now be colored the dark grey 80 100 range while the single R would remain uncolored If instead the column consisted of 7 K s and 3 R s
230. splays some statistics about the sequence s being viewed They correspond to the sequence alignment assembly being viewed or the highlighted part of the sequence align ment assembly The length of the sequence or part of the sequence is displayed next to the Statistics option Residue frequencies This section lists the residues for both DNA and amino acid sequences and also for alignments and assemblies It gives the frequency of each nucleotide or amino acid over the entire length of the sequence including gaps If there are gaps then a second percentage frequency is calculated ignoring gap characters The G C content for nucleotide sequences is shown as well for easy reference The following statistics are available when viewing protein sequences Molecular Weight Calculates the molecular weight of the protein using the following values for the amino acids A 71 0788 R 156 1875 N 114 1038 D 115 0886 C 103 1388 E 129 1155 Q 128 1307 G 57 0519 H 137 1411 12113 1594 L 113 1594 K 128 1741 M 131 1926 F 147 1766 P 97 1167 S 87 0782 T 101 1051 W 186 2132 Y 163 1760 V 99 1326 U 150 0388 O 237 3018 Isoelectric Point Calculates the isoelectric point of the protein as per this method but using the following values for the amino acids D 3 9 E 4 1 C 8 5 Y 10 1 H 6 5 K 10 8 R 12 5 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 75 Extinction Coefficient Calculates the extinction coefficient of the protein as per this paper using the following valu
231. ssible word series of bases of a specified length occurs 2 Each read is processed one at a time Each word within that read is located in the reference sequence and that is used as a seed point where the matching range is later expanded outwards to the end of the read 3 If a read does not find a perfectly matching seed the assembler can optionally look for all seeds that differ by a single nucleotide 4 Before the seed expansion step all seeds for a single read that lie on the same diagonal are filtered down to a single seed 5 During seed expansion when mismatches occur a look ahead is used decide whether to accept it as a mismatch or to introduce a gap in either the reference sequence or read 6 The mapper handles circular reference sequences by indexing reference sequence words spanning the origin and allowing the expansion step to wrap past the ends 7 All results are given a score based on the number of mismatches and gaps introduced Normally the best scoring or a random one of equally best scoring matches are saved although there is an option to map the read to all best scoring locations 8 Paired reads are given an additional score penalty based on their distance from their expected distance so that they prefer mapping close to their expected distance with as few mismatches as possible but they can also map any distance apart if an ideal location is not found 9 The final optional fine tuning step at the end shuffl
232. stances Text View R Loa EZ Add Meta Data E Save e3 E oO Properties Name People Stay Description Multiple alignment of 5 sequences from fictitious characters Lineage Notes Source S x Depth 15 LatLong Date Collected 12 Sep 2012 Click Add Meta Data in the toolbar to add your own custom information Figure 2 14 The Properties View 2 8 1 Editing Meta Data To edit meta data fields simply click on the field and enter your data Some fields may have constraints which you can edit in the Edit Meta Data Types dialog see 2 8 2 If the data you 2 8 META DATA 53 have entered does not conform to the constraints of the field it will be displayed in red and you will be shown the field s constraints in a tooltip Tip To enter a new line in a text field press Shift enter or Ctrl enter When multiple documents are selected the Properties view displays all of the fields and meta data belonging to the selected documents When all documents have the same value for a field it is displayed in the viewer If the documents have different values or some of the selected documents do not have a value then the field will show that it represents multiple values Changes made to the fields will apply to all selected documents 2 8 2 Editing Meta Data Types To edit meta data types click the Add Meta Data button on the viewer toolbar and select Edit meta data types This wil
233. stead Entrez Gene focuses on the genomes that have been completely sequenced that have an active research community to contribute gene specific in formation or that are scheduled for intense sequence analysis Entrez SNP In collaboration with the National Human Genome Research Institute The Na tional Center for Biotechnology Information has established the dbSNP database to serve as a central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms The scope and depth of these databases make them critical information sources for molecu lar biologists and bioinformaticians alike However a library is only as good as its librarian Geneious is your librarian allowing you to search for filter and store only the data that you care about 2 4 4 Accessing NCBI BLAST through Geneious BLAST 1 stands for Basic Local Alignment Search Tool It allows you to query the NCBI sequence databases with a sequence in order to find entries in the public database that contain similar sequences When BLAST ing you are able to specify either nucleotide or protein sequences and nucleotide sequences can be either DNA or RNA sequences The result of a BLAST query is a table of hits Each hit refers to a GenBank accession number and the gene or protein name of the sequence Each hit also has a Bit score which provides information 38 CHAPTER 2 RETRIEVING AND STORING DATA about how similar the h
234. sting entirely of gaps moves the gaps to the new location To quickly select a single residue double click on it Triple clicking will select a block of residues within a single sequence Quadruple clicking selects a block of residues in multiple sequences The Shift and Ctrl Alt Option on a Mac keys can be combined with the keyboard arrow keys to select sequence and alignment regions The Shift key extends the current selection and holding down the Ctrl Alt Option on a Mac key while pressing the keyboard arrow is equivalent to pressing it 10 times These can be used together For example in an alignment if you have a region of one sequence selected and would like to select the same region in all sequences then you could press Ctrl up until you reach the first sequence and then press Ctrl Shift down and few times until all sequences are selected Sequences can be reordered within an alignment by clicking the sequence name and dragging Sequences can be removed from an alignment by right clicking Ctrl click on Mac OS X on the sequence name and choosing the remove sequence option Alternatively select the entire sequence by clicking on the sequence name and press the delete key To delete a region of an alignment select the region and press the delete or backspace key Normally this will move residues on the right into the deleted area By holding down the Shift key while deleting residues on the left will be moved into the dele
235. t per gener ated product Insert each fragment separately e The minimum overlap length of the complementary sequence can be specified as well as the minimum melting temperature of the complementary sequence Min Overlap Tm e To calculate the Tm a collapsible field is available showing the options provided and required by primer3 The operation will remove 5 overhangs and fill up 3 overhangs that are eventually present on the sequences eg when they derived from a restriction digestion The possible different se quence combinations are created and complementary extremities that might be already present will be considered when primers get designed For sequences without complementary overlaps a pair of primers will be generated If both or only one of the ends are complementary primers for both ends will be created since the sequence will still have to be modified by a Primer Extension PCR to make it compatible at the opposing end If both are complementary no primers will be generated extensions will be added to the primer corresponding to the neighboring sequence Thereby modifications that have been manually introduced at the extremities annotations with the tag editing history will be automatically added as part of the extension so that they get intro duced to the sequence during the PCR The melting temperature is calculated for the primer binding sequence and the extension part including the modified bases For bot
236. t region and allows you to choose a small region on either side of the target in which primers must lie e Target Region Specifies which region of the sequence you wish to amplify and unless the advanced options allow otherwise the forward and reverse primers must fall somewhere outside this region e Product Size Specifies the range of sizes which the product of a primer pair can have The product size is the distance in bp between the beginning of the forward primer to the end of the reverse primer 112 CHAPTER 4 ANALYSING DATA e Optimal Product Size Specifies the preferred size of the product Setting this will mean primer pairs that have a product size close to this will be chosen over those that do not Warning Setting these options can cause the primer design process to take considerably longer to complete The final option in this section is Number of Pairs to Generate which specifies how many candidate pairs of primers and DNA probes to generate and is compulsory Setting this to 1 will give you only the primer pair which was considered best by the set parameters Cloning primers This option allows you to design primers to amplify a specific region Only the included region can be set and the primers will be designed to the very ends of this region so that the entire region is included in the PCR product This option is useful for amplifying an entire CDS for creating an insert for cloning Tm calculation This section giv
237. t your BROWSER environment variable to the name of your browser The details depend on your browser and type of shell For example if you are using Mozilla and bash then put export BROWSER mozilla in your bashrc file When using a csh shell variant put setenv BROWSER mozilla in your cshrc file 15 2 NETWORK ISSUES 195 Plugins and Features Appearance and Behavior Keyboard NCBI Sequencing Data Storage Location Users user Geneious 5 4 Data Search History Check for new versions of Geneious C Also check for beta versions of Geneious Check for updates now Enable Geneious Pro days Use browser connection settings Connection settings Proxy host Proxy port Config file location Proxy Password Proxy Help Gn Cees GOED Figure 15 5 General Preferences 196 CHAPTER 15 TROUBLESHOOTING 15 3 Geneious is slow Geneious has pretty high memory and CPU requirements It is becoming increasingly impracti cal to run it on 32 bit hardware since the realistic upper limit for memory that can be dedicated to Geneious is 1GB on those machines With that said there are things that can be done to improve the performance of Geneious even on limited machines 15 3 1 Memory Geneious runs in a Java Virtual Machine When this JVM starts it will be allocated a certain amount of RAM and the program can use less than that but never more In the Preferences Appearance and Behavior tab th
238. tart 15 8 2 Activation issues Geneious has a FLEXnet based licensing system that requires on line activation The main issue with activation is if the program cannot access the licensing website The address of this website is http licensing biomatters comso if that site is blocked by a firewall then Geneious will be unable to register the license Since this server is on port 80 it should be reachable but you may need to configure the proxy settings to enable access 15 8 3 Admin license activation Installing the license service requires administrator privileges However the admin should not activate the license because personal licenses are only available to one user on a machine and by doing so the actual non admin user will not be able to use the license If you must verify that the license works make sure you release it using the Help menu item Note there is a limitation on the number of times a license can be released to prevent license sharing FLEXnet licensing only needs to be installed once by the administrator after which the user can upgrade Geneious as a non admin 208 CHAPTER 15 TROUBLESHOOTING 15 8 4 Downgrading versions When Geneious upgrades it offers to create a new folder with the new version name and copy the data from the old data folder into this new one This will mean you can downgrade if you prefer to use the earlier version or if your license isn t able to run the latest version due to support expi
239. tation information You can use a VCF file to anno tate existing sequences in your local database import entirely new sequences or import the annotations onto blank sequences Vector NTI formats Geneious supports the import of several Vector NTI formats e gb and gp formats These formats are used in Vector NTI for saving single nucleotide and protein sequence documents They are very similar to the GenBank formats with the same extensions although they contain some extra information e apr format This format is used for storing alignments and trees made with AlignX Vector NTT s alignment module e maf pa4 0a4 ea4 and ca6 formats These are the archive formats which Vector NTI uses to export whole databases e cep format This format is produced by the ContigExpress module and Geneious will import sequences including the positions of the base calls traces qualities trimmed regions annotations and editing history for individual reads and contigs 2 2 IMPORTING AND EXPORTING DATA 31 2 2 3 Where does my imported data go The above formats can be all imported into Geneious from local files Geneious also enables you to download certain types of documents directly from public databases such as NCBI and EMBL The method used to retrieve a particular piece of data will determine where in Geneious it is stored Data imported from local files This is imported directly into the currently selected local folder wi
240. ted area instead Similarly holding down the Shift key when inserting will push residues to the left instead of right Shift click on two restriction site annotations in the sequence view to select the region between their cut sites on the forward strand After editing is complete click Save to permanently save the new contents 3 2 15 The Pop up menu in the sequence viewer The toolbar actions are available via a pop up menu as well Right click Ctrl click on Mac OS X on any sequence partly highlighted sequence or annotation to show the various options The pop up menu contains the Copy residues action keyboard Ctrl C to copy the selected residues to the system clipboard 78 CHAPTER 3 DOCUMENT VIEWERS 3 2 16 Printing a sequence view To print a sequence view go to File Print and click OK The view is printed without the options panel It is recommended to turn on Wrap sequence and deselect Colors before printing Wrapping prints the sequence as seen in the sequence viewer and the font size is chosen to fill the horizontal width of the page 3 3 Annotation Viewer The Annotations tab appears whenever sequences containing annotations are selected It displays each annotation as a row in a table with columns corresponding to the qualifiers for the annotations Selection of annotations is synchronised with other viewers such as the sequence viewer and dotplot 3 3 1 Menu e Ty
241. th COGs info Just give me the COGs info Figure 6 2 Configuring a COGs BLAST 146 CHAPTER 6 COGS BLAST Chapter 7 Pfam Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families The data for Pfam is taken from sequences in UniProt Pfam can be found online at the following locations e Sanger Institute UK e Washington D C USA e Karolinska Institutet Sweden e Institut National de la Recherche Agronomique France 7 1 Setting up the Pfam databases At the time of release of Geneious 3 5 there was no public online interface to the Pfam database although there is one in the works at the Sanger Institute For this reason if you want to search the Pfam databases you will need to download them first As of Pfam 22 July 2007 the subset of the Pfam databases used by Geneious totalled about 4GB in size so it is recommended you download them somewhere with a fast connection You can use Geneious to search five of the Pfam databases 1 Pfam A seed 29 MB contains records on the manually curated domains in Pfam A and the seed alignment alignment of a representative subset of all occurrences of this domain in UniProt sequences for each domain 2 Pfam A full 392 MB contains records for the manually curated domains in Pfam A and the full alignment alignment of all occurrences of this domain in UniProt sequences for each domain 147 14
242. than just free end gaps in one alignment If you are aligning nucleotide sequences you will also have the option of doing your alignment by translation and back To view the options for translation alignment click the More Options button that the bottom of the alignment dialog The translation alignment options will appear Here you can set the genetic code and translation frame for the translation as well as the cost matrix gap open penalty and gap extension penalty for the alignment If you want to set the alignment type global or local or choose to automatically determine the sequences direction do it in the main section of the dialog 8090 Pairwise Multiple Align A Geneious Alignment MUSCLE Alignment ClustalW Alignment Realign Ret Translation Align Genetic code Standard 3 Translation frame 1 Y Treat first codon as start of coding region Protein alignment options Geneious Alignment Cost Matrix Blosum62 Gap open penalty 121 Gap extension penalty 3il Alignment type Global alignment with free end gaps s Y More Options Cancel ok Figure 4 2 Options for nucleotide translation alignment 4 4 2 Multiple sequence alignments A multiple sequence alignment is a comparison of multiple related DNA or amino acid se quences A multiple sequence alignment can be used for many purposes including inferring the presence of ancestral relationships between the sequences It should be noted that prot
243. that it recognizes all 151 MID se quences provided by 454 and uses their names when appropriate The 454 Adapter B sequence is trimmed from the end of the MID sequences For further information on splitting barcode data hover the mouse over any of the settings in the Separate Reads by Barcode options window 4 7 6 Viewing Contigs Contigs in Geneious are viewed and edited in exactly the same way as alignments There are several features in the sequence viewer which are worth taking special note of when viewing contigs e The consensus sequence is normally of particular interset and this is always displayed at the top of the sequence view labeled Consensus e When all sequences in a contig or alignment have quality information attached then you can select the Highest Quality consensus type This almost removes the need for manually editing the contig because this consensus chooses the base with the highest total quality at each position 132 CHAPTER 4 ANALYSING DATA e There is a Quality color scheme which is selected by default for alignments of all chro matograms This assigns a shade of blue to each base based on its quality Dark blue for confidence lt 20 blue for 20 40 and light blue for gt 40 The consensus is also colored with this scheme where the confidence of a given base in the consensus is equal to the maximum confidence from the bases at that site in the alignment e The sequence logo graph has an opt
244. that you can paste or type in Extract Region Reverse Complement Translate Sometimes a selection in the sequence viewer is required before performing these Back Translate creates an ambiguous nucleotide version of the selected protein document Circular Sequences sets whether the currently selected sequences are circular This effects the way the sequence view displays them as well as how certain operations deal with the sequences eg digestion 2 1 THE MAIN WINDOW 21 e Free End Gaps Alignment sets whether the currently selected alignment has free end gaps This effects calculation of the consensus sequences and statistics e Change Residue Numbering changes the original residue numbering of the selected sequence On a linear sequence this is used to indicate that a sequence is a subsequence of some larger sequence On a circular sequence this is used to shift the origin of the sequence e Convert between DNA and RNA changes all T s in a sequence to U s or vice versa de pending on the type of the selected sequence Once this is performed click Save in the Sequence View to make the change permanent e Set Paired Reads sets up paired reads for assembly See section 4 7 4 e Set Read Direction marks sequences as forward or reverse reads so the correct reads are reverse complemented by assembly e Separate Reads by Barcode separates multiplex or barcode data e g 454 MID data e Group Sequences into a List
245. the files have finished downloading and setting up you will need to close the dialog If you shut down Geneious with a file partially downloaded you will need to start downloading it again from the beginning Files completely downloaded will not need to be downloaded again 6 2 BLASTing COGs Select any sequence in the document table right click it and select Sequence Search Select the COGS database from the database drop down box and Geneious will give you several options for your blast see Figure 6 2 Number of hits to fetch allows you to fetch results for 6 2 BLASTING COGS 145 the best n hits for your sequence You can choose to download COGs sequence from NCBI with full annotations or to load them without annotations from the COGs database file Finally you have the option of retrieving the sequences for your hits the entire COG for each hit or to just display information about the hits Once you have made your choices click OK If you have selected a Nucleotide sequence Geneious will give you options to translate it at this point Sequence Search pygmy chimpanzee nucleotide Query Use selected alignments for profile searct Enter unformatted or FASTA sequence C Subsequence l Database COGs v wv Add Remove Databases x Program COGs BLAST Number of hits to Fetch Retrieve sequences from NCBI vi Retrieve single sequence per hit with COGs info Retrieve entire cog per hit wi
246. the sequence or the average pI when multiple sequences are being viewed Identity This is available for sequence alignments It displays the identity across all sequences for every position Green means that the residue at the position is the same across all sequences 3 2 THE SEQUENCE AND ALIGNMENT VIEWER 71 Yellow is for less than complete identity and red refers to very low identity for the given posi tion Figure 3 6 Sliding window size This calculates the value of the graph at each position by averaging across a number of surrounding positions When the value is 1 no averaging is performed When the value is 3 the value of the graph is the average of the residue value at that position and the values on either side Quality This is available with enabled chromatogram traces It displays a quality measure typically Phred quality scores for each base as assessed by the base calling program The quality is shown as a shaded bar graph overlaid on top of the chromatogram Note that those scores represent an estimate of error probability and are on a logarithmic scale the highest bar represents a one in a million 10 probability of calling error while the middle represents a probability of only a one in a thousand 1073 3 2 8 Annotation Types gt Some protein and nucleotide sequences come with annotations and these can be viewed within Geneious sequence viewer Annotations can either be annotated directly on a sequence
247. thin Geneious If no folder is selected Geneious will open a dialog which lets you specify a folder Data from an NCBI EMBL Contacts search Data downloaded from public databases within Geneious will appear in the Document Table when that database is selected and can be dragged from there into a local folder of your choice Important if you don t drag the documents from a database search into your local folders the results will be lost when Geneious is closed 2 2 4 Data output formats Each data type has several export options Any set of documents may be exported in Geneious native format Data type Export format options DNA sequence Amino acid sequence Chromatogram sequence Sequence with quality Annotation Multiple sequence alignment Assembly Phylogenetic tree PDF document Publication Graphs CSV WIG Document Properties FASTA Genbank XML Genbank flat Geneious FASTA Genbank XML Genbank flat Geneious ABI Geneious FastQ Qual Geneious GFF BED Geneious Phylip FASTA NEXUS 13 MEGAS 12 Geneious Phrap ACE Geneious SAM BAM Phylip FASTA NEXUS 13 Newick MEGA3 12 Geneious PDF Geneious EndNote 8 0 Geneious CSV TSV Geneious Additionally documents imported in any chromatogram or molecular structure format can be re exported in that format as long as no changes have been made to the document 32 CHAPTER 2 RETRIEVING AND STORING DATA 2 2 5 Export to comma separated
248. tion such as Maximum Likelihood and Bayesian MCMC we recommend specialist software such as MrBayes 19 and PhyML 7 which are available as a plugins to Geneious These can be downloaded from the plugins page on our website Geneious implements the Neighbor joining 20 and UPGMA 15 methods of tree reconstruc tion 4 5 1 Phylogenetic tree representation A phylogenetic tree describes the evolutionary relationships amongst a set of sequences They have a few commonly associated terms that are depicted in Figure 3 12 and are described below Branch length A measure of the amount of divergence between two nodes in the tree Branch lengths are usually expressed in units of substitutions per site of the sequence alignment Nodes or internal nodes of a tree represent the inferred common ancestors of the sequences that are grouped under them Tips or leaves of a tree represent the sequences used to construct the tree Taxonomic units These can be species genes or individuals associated with the tips of the tree 4 5 BUILDING PHYLOGENETIC TREES 105 A phylogenetic tree can be rooted or unrooted A rooted tree consists of a root or the common ancestor for all the taxonomic units of the tree An unrooted tree is one that does not show the position of the root An unrooted tree can be rooted by adding an outgroup a species that is distantly related to all the taxonomic units in the tree A common format for representing phylogenetic trees is t
249. u or by selecting the same option from the Collaboration submenu 9 1 1 Add New Account In this dialog you are given the options of creating a new account on the server or entering the details for an existing account e g if you want to access an account from an additional computer If you choose to create a new account Geneious will attempt to automatically register your account on the server at the end of this process 7 Add New Account Create a new account on the server This account already exists just connect Username Password Confirm password Email address optional C Connect every time I run Geneious Y More Options Figure 9 1 Add New Account dialog box Choose a username and password now Enter your password twice for a new account You can also optionally add an email address Biomatters will need this if you require support regarding e g reset of password or deletion of accounts More Options You can change some of the defaults for new and exiting accounts e Account Name is the name displayed in the Services Panel for this account It defaults to your username if nothing is entered 9 1 MANAGING YOUR ACCOUNTS 155 e Server is the server your account connects to default talk geneious com e Jabber Service Name is required by some other Jabber service providers such as Google Talk Don t enter anything here unless you know what you are doing e Port N
250. u see are only summaries To view the whole document select the summary s of the documents s you would like to view and the click the Download button inside the document view or just above it There are also Download items in the File menu and in the popup menu when document summary is right clicked Ctrl click on Mac OS X The size of these files is not 9 5 CHAT 159 displayed in the Documents Table You can cancel the download of document summaries by selecting Cancel Downloads from any of the locations mentioned above 9 5 Chat You can either chat with a single contact or invite several contacts to join you in a new chat 9 5 1 Chatting with One Contact To start chatting with a particular contact who may be online using Geneious or another chat client which uses the Jabber protocol click on that contact and select New Chat Session either from the Collaboration submenu or from the popup menu right click on the contact or Ctrl click on Mac OS X Type your messages into the text field at the bottom of the window that pops up and click Send or press the Enter key to send 9 5 2 Chatting with Multiple Contacts Starting a Chat Session with Multiple Contacts To invite several contacts to join you in a new chat session click on your account not the contacts and then select New Chat Session from either the Collaboration submenu or the context menu right click on the accou
251. u set rather than the normal default Examples of features you can change e Turn off automatic updates e Set default custom BLAST location e Set up a shared Database e Set up a proxy server default e Turn off particular plugins Any users who have already run Geneious should click the Reset All Preferences button in the Geneious Preferences to load these defaults 14 2 2 geneious properties file Any preferences which can be set within Geneious can also be set from the geneious properties file which can be found in the Geneious installation directory Some examples are present in the file already remove the hashes from the start of the lines and modify the values to use them If you need to find out how to set other preferences using this file please use the Support button in the Geneious toolbar to request help 14 3 Specify license server location Create a plain text file in the Geneious installation directory called server txt that has the hostname on the first line and the port on the second line 14 4 Deleting plugins Features of Geneious can be turned off in preferences so the section on changing default pref erences would be the simplest solution However if you really want to delete a feature com pletely so your users can t reinstate it you should shut down Geneious go to the installation directory into the bundledPlugins directory and delete the desired plugin jar files folders 14 5 MAX MEMORY 187 14
252. uction of the contig where the reads extend beyond the length of the reference then you have two options With iterative fine tuning reads can extend a bit further past the ends of the reference sequence on each iteration so make sure you set the number of iterations high enough Or you could select all sequences including the reference and use the De Novo assembler 4 73 Trimming Trimming low quality ends of sequences is normally performed before assembling a contig This is because the noise introduced by low quality regions and vector contamination can pro duce incorrect assemblies The easiest way to trim sequences is at the assembly step Select the trim options you wish to use in the Assembly options and click OK The sequences will be trimmed and assembled in one operation This means you cannot view the trimming that Geneious uses before assembly is performed but the trimmed regions will still be available and adjustable after assembly is complete Trimmed annotations on the ends of sequences are ignored when calculating the consensus sequence for a contig So although the trimmed regions are visible they do not affect the results of assembly at all Sequence trimming can be performed before assembly by selecting the sequences you wish to trim and selecting Annotate amp Predict Trim Ends This will add Trimmed annota tions to the sequences which are ignored in the construction of a contig When performing
253. umber for Jabber servers running on a non standard port default 5222 Add New Account gt Create a new account on the server This account already exists just connect Username Password Confirm password Email address optional C Connect every time I run Geneious Account Name Server talk geneious com Jabber Service Name Port Number 5222 A Fewer Options Figure 9 2 Add New Account dialog box with More Options 9 1 2 Edit Account Details This option from the Collaboration submenu or your account s context menu allows you to change the configuration you made when creating the account If you change your password Geneious will attempt to change it on the server the next time you connect For this purpose Geneious internally remembers your previous password as well so that it can still connect if you have entered your new password while disconnected 156 CHAPTER 9 COLLABORATION 9 1 3 Connect Disconnect As all other collaboration related commands options for connecting to or disconnecting from your account are available both in the Collaboration submenu and your account s context meu right click or on Ctrl click on Mac OS X on your account 9 1 4 Delete Account This option deletes your account configuration from Geneious Currently there is no option for deleting an account on the server 9 2 Managing Your Contacts
254. uments e Cloning Digest into Fragments e Cloning Insert into Vector e Cloning Ligate sequences e Cloning Gateway e Primers Extract PCR product 88 CHAPTER 3 DOCUMENT VIEWERS e Sequence Viewer Extract e Sequence Viewer Translate Note Extract and Translate will not create active links by default To do so you must select Actively link source and extracted documents checkbox in the relevant dialog see Figure 3 14 otherwise they will be created with permanently inactive links Extract a Extraction name MA Ea Actively link source and extracted documents ox Cancel Figure 3 14 Extract dialog with active link checkbox 3 9 1 Editing Linked Documents When you make changes to a document that is the parent of another document you will be given the opportunity to either propagate the changes deactivate the link which can later be reactivated see Lineage View Section 3 9 2 or save the changed document as a new copy Fig ure 3 15 You may also simply back out of this process by choosing to cancel which will return you to your unsaved changes Note that if you choose to deactivate the link this dialog will not be displayed upon subsequent saves of the parent document unless the link is reactivated again at some future time r Actively Linked Descendants nese This document has actively linked descendants View Descendants What would you like to do Propagate changes a
255. ut if there is a firewall preventing direct access then it will have to go via a proxy Find out what the machine address and port are plus any user name and password necessary and put those into the network settings in Preferences General tab Figure 15 9 Connection settings Use browser connection settings B Proxy host Proxy port Config file location Proxy Password Proxy Help Figure 15 9 Proxy settings in Preferences The implementation of the use browser settings may not work depending on the platform On Windows if the proxy is set in Internet Explorer it should work Also if a PAC file is 15 5 BLAST ISSUES 201 specified Geneious will just grab the host address and port settings it specifies and use them to fill in the fields automatically 15 5 2 Setting up BLAST for multiple users The correct solution is to set up a WWWBLAST NCBI mirror locally and mirror all the BLAST databases as well as add some of your own This will replace access to the NCBI service it self though This may be too much for some people so they consider using CustomBLAST to achieve something similar One approach is to provide users with a set of sequences in FASTA format that they can create a CustomBLAST database from and keep that up to date and have them replace their local copies This has the advantage that it is essentially purely parallel so it will scale indefinitely but it has the disadvantage t
256. ve all your files together put the contents of the folder in a zip file with the exten 151 152 CHAPTER 8 GENEIOUS EDUCATION sion tutorial zip Be careful not to put subfolders in your zip file as these are not supported 8 2 Answering a tutorial Import the tutorial document into Geneious use File Import From file The tu torial document and any associated geneious documents will be imported into the currently selected folder The tutorial itself will be displayed in the help pane on the right hand side of the Geneious window If you accidentaly close the help pane you can display it by choosing Help from the Help menu If the tutorial requires you to enter answers click the edit button at the top of the tutorial window and type your answer in to the space provided Click the Save button when you are done If the tutorial has a link to a Geneious document when you click the link the document will be opened in the document viewer Any changes you make to this document will be preserved when you export the tutorial When you have finished the tutorial export it by selecting the tutorial document and choosing File Export Selected Docuemnts from the main menu Make sure that Geneious Tutorial File is selected as the filetype and then give it a name and click Export Chapter 9 Collaboration Collaboration is an external plugin Collaboration all
257. ved primers may not function correctly or at all sing SSL PISAMBmemory 0 Paused search index 8 items delame po Figure 15 7 Pausing the indexer It is possible for certain really large documents to cause the indexer to crash so if you hover your mouse over the indexing indicator it will identify the document that caused the problem in a tool tip Delete that document export it to a safe place if you want to keep it and then restart Geneious and the indexer should finish and go quiet 15 3 3 Alignments take a long time Although this is an operation it can be seen as a Geneious is slow issue because users often choose the wrong alignment tool and complain about performance The standard Geneious aligner is based on dynamic programming and will be slow when presented with long se quences or large sets of sequences In the case of large multiple alignments you should look at 198 CHAPTER 15 TROUBLESHOOTING MUSCLE or MAFFT rather than the standard Geneious aligner These are much faster and still quite accurate in most cases Some users have also tried to align genomes but this is bad because it will be horrendously slow use an huge amount of memory and usually crash as a result and the end result is likely to be very poor simply because genomes tend to have inverted and duplicated regions which a traditional pairwise aligner won t cope well with The Mauve Genome Alignment plugin exists for this purpose Figure 15 8
258. vigate through Geneious without using the mouse and also allows you to redefine shortcuts to ones you may be familiar with from other pro grams Double click on a function to bring up a window to enter your new keyboard shortcut If you use one that is already assigned Geneious will tell you what function currently has that shortcut 2 9 5 Sequencing This tab has options for the management of trace files and assemblies e Confidence Set the threshold values of base call confidence used to determine if a base call is low medium or high quality This affects the binning parameters described below as well as the Confidence color scheme in the Sequence View e Sequence binning options Specifies the requirements for individual traces to be binned as medium or high quality overall To see the Bin for a trace turn on the Bin column under Table Columns in the View menu Assembly binning options Specifies the requirements for assembly documents to be binned as medium or high quality overall To see the Bin for an assembly turn on the Bin column under Table Columns in the View menu e Track binning history in meta data When turned on meta data will be added to traces when they are trimmed see the Properties view tab This meta data will then updated every time the trace is re trimmed maintaining a history of the trimming 2 10 PRINTING AND SAVING IMAGES 59 e Enable per folder document binning When turned on the Set Binning Paramete
259. wnload or otherwise acquire the NCBI BLAST binary files outside of Geneious You can download them from here ftp ftp ncbi nih gov blast executables release LATEST Choose the appropriate file for your operating system download and extract it You will need to let Geneious know where to look for the files once you have done this To do this go to Tools Add Remove Databases Set Up Search Services and select Custom BLAST from the Service drop down box Enter your data location or click Browse to browse to the location of the files Note If you decide to use executables for another version of BLAST then make sure to use the legacy executables and not the newer BLAST executables that are not compatible with Geneious 139 140 CHAPTER 5 CUSTOM BLAST 5 1 2 Setting up the Custom BLAST files through Geneious Geneious provides a download manager to help you download and extract the Custom BLAST files To use it go to Tools Add Remove Databases Set Up Search Services and select Custom BLAST from the Service drop down box Make sure Let Geneious do the setup is checked Then click OK After a few seconds the compressed file containing all the files needed to run Custom BLAST will start downloading You can click Pause to pause the down load You can add and search Custom BLAST databases as soon as it has finished downloading and extracting If you shut down Geneious with
260. ws you to directly download information from nine important NCBI databases and perform NCBI BLAST searches Table 2 1 Table 2 1 NCBI databases accessible via Geneious Database Coverage Genome Whole genome sequences Nucleotide DNA sequences PopSet sets of DNA sequences from population studies Protein Protein sequences Structure 3D structural data PubMed Biomedical literature citations and abstracts Taxonomy Names and taxonomy of organisms SNP Single Nucleotide Polymorphisms Gene Genes The Entrez Genome database The Entrez genome database has been retired For backwards compatibility Geneious simulates searching of the old genome database by searching the Entrez Nucleotide database and filtering the results to include only genome results The Entrez Nucleotide database This database in GenBank contains 3 separate components that are also searchable databases EST GSS and CoreNucleotide The core nucleotide database brings together information from three other databases GenBank EMBI and DDBJ These are part of the International collaboration of Sequence Databases This database also contains Ref Seq records which are NCBI curated non redundant sets of sequences The Entrez Popset database This database contains sets of aligned sequences that are the result of population phylogenetic or mutation studies These alignments usually describe evolution 2 4 PUBLIC DATABASES 37 and population variat
261. ws you to enter how many megabytes of your com puters memory RAM you wish to allow Geneious to use Specifically this sets the maximum Java heap size You should never set this to be the total memory of your computer as you need to leave some RAM available for your operating system For example if you have 4GB avail able you should set Geneious to have no more than 2GB so the operating system will have enough room to perform its tasks Even on machines with a lot more memory it is still a good idea to leave 2GB or more for the operating system to keep your computer running smoothly Connection settings These are described in the troubleshooting section of the manual 2 9 2 Plugins and Features The Plugins and Features tab Figure 2 18 lets you manage downloadable plugins and change the features available in Geneious e Available Plugins Lists all plugins which are currently available for download from the Geneious website which aren t already installed Each plugin is listed with a status which can be a star for exciting plugins New or Beta Click the Info button to read more about the plugin or click the Install button to download the plugin and install it e Installed Plugins Lists all plugins you currently have installed Click the uninstall but ton next to a plugin to remove it e Install plugin from a gplugin file If you have downloaded a plugin from our website or obtained one from another source usually in gplugin
262. you have your account configured on the server you ll need to install the necessary Geneious Server plugins Many of your normal Geneious plugins are already server aware but there are other plugins which are different from the standard plugins or are Geneious Server exclusive as they offer features unique to Geneious Server Your administrator can provide you with a download location for Geneious Server plugins You can get them either from the Geneious Server itself or they may be hosted on a network 179 180 CHAPTER 13 GENEIOUS SERVER location with the gplugin files If you have the plugin files just drag them all into Geneious If you have to go to the web interface get the URL from your administrator and you should see a page like figure 13 1 geneious SERVER Download the client plugin file Install this file into your copy of Geneious to access Geneious Server Download the BWA plugin file Install this file into your copy of Geneious to install BWA plugin Download the Bowtie plugin file Install this file into your copy of Geneious to install Bowtie plugin Download the LASTZ plugin file Install this file into your copy of Geneious to install LASTZ plugin Download the Mafft plugin file Install this file into your copy of Geneious to install Mafft plugin Download the Maq plugin file Install this file into your copy of Geneious to install Maq plugin Download the Mauve plugin file Install this file into your copy of Ge
Download Pdf Manuals
Related Search
Related Contents
Neuros (4010100) MP3 Player sustenteur de lucotte - Association des Amis du Patrimoine Médical Peavey SP5BX User's Manual SERVICE MANUAL Sismicité et sismotectonique de la Région PACA Copyright © All rights reserved.
Failed to retrieve file