Home
The GeneSpring User Manual for version 4.1
Contents
1. Outputs Figure 1 2 A Sample Script Everything in the browser must be correctly connected to another object Check the bot tom of the screen for error or warning messages If any error messages are present the script cannot be run You can run a script with warning messages active but it may not function as intended When you are ready to save your script see Saving Scripts on page 8 18 Bui Iding Scripts A typical script is built by arranging building blocks in the browser Building blocks can b e other scripts script building blocks or external program building blocks To build a script l Co VA ERO A 8 16 Scripts a Select a building block and click in the browser to place it Click and drag the edges of the building block to make it larger or smaller Details about the building block appear in the Block section on the right side of the main ScriptEditor window as long as it is selected You can click anywhere else in the browser to de select the building block Set knobs if necessary These may be pull down menus or text fields and appear in the lower right section of the ScriptEditor window Create a line from the input top to the first building block Select a new building block from the folders in the navigator Connect the first building block to the second Repeat steps 2 through 6 as necessary nd External Programs Using the ScriptEditor 7 When you are close to done
2. 2 0000005 6 13 Making Lists from Classifications 0 00 cece ee eee 6 14 Making Lists from Selected Genes 0 0 00 ce eee 6 14 Creating Expression Profiles 6 15 Pathways ceo store ecu Ee RP Oy ee Ite Sy ois uere Un 6 16 Regulatory Sequences 0 0 6 18 The Homology Tool 1 0 0 0 000 cece eee 6 24 Annotation Tools sen rra ea utere Bae eee E ee een eut 6 27 Updating your Master Gene Table with GeneSpider 6 27 Genome Databases llle 6 30 Building a Simplified Ontology 0 00 cece eee eee 6 31 To Make Gene Lists From Properties 002 eee 6 32 Building Homology Tables 00 0 ce cece eee eee 6 32 Statistical Analysis ANOVA 000 e cece eens 6 33 l Way ANOVA nuroni e Ie eP SCC eR EE De ah aces 6 34 Post Hoe Testss s oon osi vorne 8 eee ce oe Ruane ae Dae 6 39 Viewing Post Hoc Test Results 2 0 0 0 esee 6 41 2 Way ANOVA iere RE Rede ae Ea I HE RU EUR 6 43 Details on 2 Way ANOVA sseseeeeee es 6 45 The Filtering Menu 0 0 eect eens 6 51 The Basic Anatomy of a Filtering Window 6 51 Basic Filters 1 thet Het eoe requies ehe 6 54 Filter on Expression Level 0 0 0 0 0 0 c cece eens 6 54 Filter on Fold Change sess 6 55 Falter on ErfoT au doner eee eR RR V eee es 6 58 Filter on Confidence 2 0 0 cece eee eens 6 59 Filter On Flags srs ete Let t ee
3. To save your results 1 Enter a name in the Name field at the top of the screen Names may not exceed 80 characters To save the results as a classification select the Classification radio button To save the results as a group of gene lists select the Gene Lists radio button From the navigator select a folder in which to save the new classification or gene lists To create a new folder navigate to the desired parent folder and enter a new folder name in the Folder field Enter any additional information in the Notes field if desired Click Save Clustering and Characterizing Data 7 15 Principal Components Analysis Principal Components Analysis Principal components analysis PCA is a decomposition technique that produces a set of expression patterns known as principal components Linear combinations of these patterns can be assembled to represent the behavior of all of the genes in a given data set PCA is not a clustering technique It is a tool to characterize the most abundant themes or building blocks that reoccur in many genes in your experiment You can run PCA on genes or on conditions By default for PCA on Genes PC scores are calculated by computing the standard correla tion between each gene s expression profile vector and each principal component vector eigenvector For PCA on conditions this means calculating the standard correlation between each condition vector and each principal component
4. H Genomes or Arrays HJ Academic Chips ta Human Oncogenes H Commercial Chips H Mergen Humano1 H Mergen Human 2 H Mergen Humano3 H gt Mergen Human 4 C Create a New Genome M Next Cancel Help Figure 3 1 The Define File Format and Genome window 3 If the file format displayed in the Choose File Format box is correct go to the next step If not select the correct file format from the pull down menu 4 From the Select Genome list choose the genome in which to save the experiment data To save the data in a new genome select the Create a New Genome radio button and enter a name in the text box For more information on creating a genome in this way see Creating a Genome from Experiment Data on page 2 9 If your data is in a known format the Genome Browser window appears For more information on the Genome Browser see Using the Genome Browser on page 4 2 If your data is in a custom format the Column Editor appears You must set up col umns before continuing See Using the Column Editor on page 3 9 for information on using the Column Editor 5 Click Next The Select Files window appears Working With Experiments 3 3 Importing Experiment Data 5 Import Data Selected Files Drives iles Selected Files ca e Name Mergen Rat01 genomedef Mergen Rat01 ORFs bt 4 21 03 1 27 PM Mergen Rat01 genomedef backup Mergen Rat01_annotations tt
5. 1 44005214 0 0313 0 692 0 859 4 A044451 0 0165 0 473 0 385 4 4052932 3 67e 6 0 00203 0 00959 2A069418 0 0499 0 907 0 749 2A070392 0 00358 0 259 0 733 2A074118 4 34e 6 0 0167 0 00959 2A102710 0 000592 0 00893 0 00959 44121825 8 97e 5 0 104 0 041 AA148092 0 000546 0 119 0 79 AA150307 0 00615 0 28 0 326 AA160059 0 0112 0 299 0 235 AA173621 0 000142 0 0983 0 117 AA207144 0 0247 0 898 0 571 AA211828 0 014 0 0179 0 0218 44213542 0 000522 0 092 0 101 44213820 0 0365 0 639 0 669 2A213931 0 0158 0 481 0 133 2A215367 0 000273 0 133 0 12 2A215428 0 00225 0 242 0 162 2A215500 0 0218 0 164 0 867 AA235622 0 00398 0 155 0 137 AA236042 0 00181 0 133 0 0445 AA236762 0 0179 0 209 0 115 AA237033 5 54e 7 0 00203 0 00219 AA243624 0 000196 0 262 0 233 0 00358 0 899 0 77 AA251182 AA Ag Details on 2 Way ANOVA Cony to Clipboard Save Lists Display in Venn Diagram Cancel Heb Figure 6 8 2 way ANOVA results window Let A and B be the two factors parameters chosen by the user Assume we are looking at a single gene and use the following notation throughout Factor A has a le
6. 7 20 Clustering and Characterizing Data Principal Components Analysis GeneSpring Yeast Genes like YMR199W CLN1 0 95 File Edit View Experiments Colorbar Filtering Tools Annotations Window Help HA Gene Lists PCA Yeast cell cycle time HA PIR keywords HA Simplified Gene Ontology I X all genes all genomic elements E ACGCGT in all ORFs 1 Y PCA component 2 PCA 2 component HC Experiments HC Gene Trees HC Condition Trees Classifications EH Pathways HCJ Array Layouts Expression Profiles External Programs Bookmarks HC Scripts E mE E is Expression PCA component 1 PC t E d PCA component 3 PCA 3 component E BH E X axis PCA component 1 PCA 1 co Colored by Yeast cell cycle time seri Y axis PCA component 2 PCA 2 co Gene List like YMR199 V CLN1 0 Z axis PCA component 3 PCA 3 co Show All Genes Z 9 Zoo J Magnification 1 Figure 7 15 PCA Scatter Plot You can change the components that are represented by each axis by right clicking in the browser and selecting Display Options Regenerating the PCA Scores Scatter Plot If you have closed the PCA scatter plot window and have saved the PCA scores as a set of gene lists or expression profiles you can reproduce the initially displayed scatter plot by doing the following PCA on Genes 1 Open View gt Scatter Plot or 3D Scatter Plot 2
7. Expression X axis Demonstration Experiment De Colored by Demonstration Experimen Y axis Demonstration Experiment De Gene List all genes 159 Z axis Demonstration Experiment De Shov 3enes Zoom Out ZO Magnification 1 Figure 4 27 The 3 D Scatter Plot View In the 3D scatter plot above each dot represents a gene The vertical position of each gene represents its expression level in the current condition and the horizontal position repre sents its control strength in this case the median expression level of this gene in all condi tions Pressing the x y or z keys rotates the graph on the specified axis Hold down the Shift key to speed this rotation Hold down the A1t key to reverse the direction of rotation Note Genes with no data cannot be displayed in this view 3D Scatter Plot Display Options The following display options are available for this view X Axis See X Y and Z Axes on page 4 50 Y Axis See X Y and Z Axes on page 4 50 Z Axis See X Y and Z Axes on page 4 50 Features See Changing Labels and Features on page 4 47 Lines to Graph See Adding Lines on page 4 46 Coloring See Coloring on page 4 47 Error Bars See Error Bars on page 4 28 Legend See Legend on page 4 28 Viewing Data 4 49 3D Scatter Plot View X Y and Z Axes The most critical option to set is the type of data that is displayed on the thre
8. p MSS B MSS error p MSS AB MSS error Each of these should be compared to the upper tail probability of an F distribution with numerator and denominator degrees of freedom given by the corresponding denominators above Case Il Proportional Replication We first check for proportional cell sizes Let nj of replicates at level i of A and level j of B and let N be the total number of replicates in all groups X yin j EE If nj for each Ny o then we have proportional replication In this case all computations are the same as before with appropriate changes In particu lar the index amp in all summations will now go from 1 to nj instead of to r Let A sum of all observations in level i of factor A gt EI i k Let B sum of all observations in level j of factor B FEA i k Analyzing Data 6 47 Statistical Analysis ANOVA Let AB sum of all observations in level of factor A and level j of factor B T EIS k We can now compute the various sums of squares terms Total sum of squares SS total EEZ C ij k Factor A sum of squares S 4 Y C i yu Factor B sum of squares B2 Ss B y c j Yny Interaction sum of squares SS AB sy C SS A SS B ij ij SS error SS total SS A SS B SS AB Compute mean sums of squares using the following degrees of freedom Factor A df a 1 Factor B df b 1 Intera
9. time 160 minutes Yeast cell cycle tir Default Interpretat Log of ratio 160 minutes Ge time 150 minutes L time 180 minutes FQ All Samples OK Cancel Help Figure 6 14 The Conditions to Filter window To specify conditions to filter choose an experiment in the navigator The conditions in that experiment appear in the upper panel to the right of the navigator To add a condition to the filter select it in the upper panel and click Add The condition is added to the Selected Conditions list in the lower panel To add all condi tions from an experiment click Add A11 To remove a selected condition select it in the lower panel and click Remove To remove all selected conditions click Remove A11 To view a condition in the Condition Inspector select it in either list and click Inspect or double click on a condition When you are done selecting conditions click OK Select the appropriate data type from the Choose Data Type menu For more information on data types for filtering see Data Types for Restrictions on page 6 53 From the Choose Comparison menu choose whether you want the signal in the first sample or condition to be greater than less than equal to or not equal to greater than or less than that in the second sample Specify a fold factor using the slider or by entering a value in the Fold Differ ence field Enter a value intheDifference must appear in at least out of
10. 004 4 69 Showing Hiding Window Display Elements 005 4 7 Normalizing Data Experiment Normalizations 00 0 cece eee eee eee 5 2 Using the Experiment Normalizations Window 5 2 Normalization Types 0 0 0 cece cence en ees 5 6 Start with Pre Normalized Values 00 00 e eee eee ee 5 6 Data Transformation 0 0 0 c eee eee eee eee 5 6 Per Spot Normalization 0 c eee ce eee eens 5 7 Per Chip Normalizations 0 000 c eee eee eee nee 5 10 Per Gene Normalizations 0 000 c eee eee eee eens 5 13 Normalization Strategies for Specific Technologies 5 17 Normalization of Affymetrix Data esses 5 17 Normalization of Two color Microarray Data 5 17 Region Normalization 0 e eee eee een eee 5 17 Dealing with Repeated Measurements 0 00002 eee 5 18 Negative Control Strengths llle ese 5 20 R f rences nnl NEE A este PEDE esent vetet Pe 5 21 Analyzing Data Creating and Editing Gene Lists 0 0 cece eee ee eee 6 2 Filtering Methods etiani 0 0c ccc E EE eee eens 6 3 Working with Gene Lists 0 0 0 0 ccc cece ene 6 6 The Find Similar Command 00 0 cece ence essen 6 6 Making Lists by Applying Filters 00 00 0000 ee ee 6 11 Making Lists from Properties 0 c eee eee eee eee 6 12 Making Lists with the Venn Diagram
11. Figure 8 3 The Remote Execution Queue The Remote Execution Queue displays the status of all jobs pending and completed The following information is available for each job e Job A unique identifier assigned by GeNet for each job Job Name The name of the script sent to the server e Genome The genome from which the data to be analyzed originates Time Submitted The time that the user launched the script from GeneSpring Time Started The time that the remote execution server began executing the script If the script is still waiting to be executed this column reads Pending The column reads Suspended if the script was paused Time Finished The time that the execution server finished running the script To pause the execution of a pending script so that another script can run first select the row of the desired job and click Pause To resume running the script select the desired row and click Resume You cannot pause a script once it has begun executing This screen does not refresh automatically To view the current status click Re resh Each script in the queue can be viewed by clicking View Once the script has finished running on the remote server the View Results button becomes available Click this button to retrieve and save the results of your script A series of file dialog windows appears that correspond to each of the outputs that your script gen erates 8 6 Scripts and External Programs
12. Select Inspect from the pop up menu 5 Condition Inspector Condition Name time 0 Experiment Yeast cell cycle time series no 90 min Experiment Interpretation Default Interpretation Mode of Analysis Log Similar Conditions Pictures Graph amples Combined in this Condition Bmemm jm 1 yeast timeseries txt column 2 0 E Experimental Parameters Defining this Condition Parameter Name vau units Change Parameters time 0 minutes Close Help 1 of 16 gt gt Figure 4 11 The Condition Inspector window The Samples and Parameters Tab This tab contains two sections Samples Combined in this Condition Lists the samples in the selected condition Select a sample and click View Sample to invoke the Sample Inspector for that sample See The Sample Inspector on page 4 13 for more information Experimental Parameters Defining this Condition Lists the parameters associated with this condition To edit parameters click Change Parame ters For more information on the Change Parameters window see Experiment Parameters on page 3 29 The Similar Conditions Tab This tab contains a list of similar conditions in the experiment with columns correspond ing to their associated values Viewing Data 4 19 Inspectors Condition Inspector Condition Name time 0 Experiment Yeast cell cycle time sereno 90 min Experiment Interpretation Default Interpretatio
13. Setting Preferences on page 1 18 for more details The Gene Inspector Window Double click a gene to bring up the Gene Inspector window This window contains spe cific information about the selected gene See The Gene Inspector on page 4 10 for details Information presented in the Gene Inspector might include knowledge you have about your selected gene typically text e graphs of the selected gene s expression profile from the current experiment links to internet or intranet databases on the web for the selected gene Making Lists There are many ways to create a list of genes see Chapter 6 Analyzing Data for more details From the Gene Inspector window you can do the following Making Lists with the Find Similar Command The Find Similar button in the Gene Inspector allows you to create a list of genes having similar expression profiles to the gene being displayed See The Find Similar Command on page 6 6 for more details Making Lists with the Complex Correlation Command The Complex Correlation button in the Gene Inspector allows you to make a list of all the genes satisfying various conditions you define See The Find Similar Genes Window on page 6 7 for more details Making Lists with the Venn Diagram Select Colorbar Color by Venn Diagram to begin Right clicking over lists in the navigator allows you to fill the diagram This function allows you to make lists based on the membership of genes i
14. E Gene Lists C PCA Yeast cell cycle time series no 90 min LJ PIR keywords H Simplified Gene Ontology 8 all genes H3 all genomic elements Add Add All XE cecor in all ORFs like YMR199W CLN1 0 95 New Gene List 1 Save Gene List Cancel Help Figure 6 1 The Gene List Editor window The left side of the screen contains tabs for each filtering method Click a tab to view options for that method These methods are Show All Display all available genes without applying a filter Filter on Annotation Display genes based on a specified annotation Typea List Manually enter a list of genes Filter on Gene List Display genes from a selected gene list The right portion of the screen contains two tables The upper table contains all of the genes resulting from the current filtering method The lower table contains the genes you have selected to add to your list Between the two tables are six buttons 6 2 Analyzing Data Creating and Editing Gene Lists Filter Results 1 17 EXETTewesmame emm p Add Add All Remove All Inspect Show Annotations New Gene List 1 SSIES common name Synonym E YPR TEW Figure 6 2 Gene Tables The buttons are as follows Add Add a selected gene in the Filter Results table to the New Gene List table Add A11 Add all genes in the Filter Results table to the New Gene List table Remove Remov
15. Normalize to a median or percentile This option allows you to divide all of the measurements on each chip by a specified per centile value By default this value is 50 0 To change this value enter a new one in the text box You do not have to restrict the measurements used in the calculation of the per centile You can limit measurements based on a specified cutoff or by flag values If measurements are limited by flag values the percentile is calculated using only the genes that pass the flag restriction To limit measurements by flag values check the Use only measurements flagged box and select the appropriate option from the pull down menu The available options are Present Only Present or Marginal Anything but Absent If measurements are limited by a cutoff the percentile is calculated from all measurements above the cutoff This cutoff can be in either raw or partially normalized units The Raw Signal option means that the cutoff is applied to the raw measurements in the original data file These measurements are back calculated based on the previous normal ization steps Rounding errors may be introduced in this process Partially normalized means that the cutoff is applied to the gene values resulting from the previous normalization steps which may or may not be equivalent to the raw measure ments To limit by a cutoff 1 Check the Use only measurements with box 2 Select whether to limit by raw signal or c
16. Please note in the above example that the baseDirectory attribute value is The second slash takes the place of the lt GenomeMappingSpec gt tag Notes Required lt GetSamplelDs gt This element specifies the location from which to upload samples There are three accepted values for the location attribute database perform a database search e directory locate files in a directory e java upload files based on the result of a Java call These values are case insensitive Contents lt DatabaseQuery gt lt DataDirectory gt FileNameMask IDFrom FileName gt lt JavaQuery gt Attributes location Usage GetSampleIDs location database 7 GetSampleIDs Notes Required for lt PhysicalDatabase gt lt GetSampleAttributes gt Specifies parameters for retrieving attributes associated with samples These can include the name value or units of the sample attribute and possibly a flag specifying whether the attribute is numeric The cacheable attribute defines whether to cache sample attribute values for previously uploaded samples This can greatly improve performance for uploads from external databases This will not affect automatic upload performance since in this case sample attributes are already retrieved only once Acceptable values are true and false The numeric attribute indicates whether the retrieved values should be considered numeric Acceptable values
17. Show Horizontal Labels Show Vertical Labels 9 2 Exporting GeneSpring Data Saving Images Use Rotated Text for Vertical Labels Force all Text to Show You can also specify the text size and font for the labels 8 Specify the color scheme to use You can choose either your current color scheme or any of the presets in the pull down menu 9 Click Save A Save As window appears 10 Choose a directory enter a file name and click Save You may need to save your file as a large custom size such as 150x150 inches to ensure all data are included in the saved image Images are saved as vector graphics which are expandable Data that are too small to view in the genome browser are saved in most cases and reappear when you expand the image Note Images containing a very large number of genes can require an exceptional amount of memory The fewer genes included in an image the smaller the image file To Save the Colorbar or Venn Diagram 1 Display the colorbar or Venn diagram to save in the display window 2 Select File gt Save Image and choose Colorbar or Venn Diagram A Save As window appears 3 Choose a directory and file name and click Save To Save the Entire Window Windows Press the Alt and Print Screen keys simultaneously to copy a picture of the current active window Paste the image into any program that accepts graphics and save it Macintosh Press 2 Shift 4 Caps Lock simultaneously The cursor cha
18. Standard Deviation The standard deviation the square root of the variance of the raw data values for each gene t test p value The statistical test of differential expression for a specific condition Natural Logarithm of Normalized Data Average The mean of any normalized replicates in the experiment Minimum The minimum normalized signal values for each gene Maximum The maximum normalized signal values for each gene Standard Error The standard error of the normalized values for each gene Standard Deviation The standard deviation the square root of the variance of the normalized values for each gene Raw Data Average The mean of any raw data replicates in the experiment Minimum tThe minimum raw data signal values for each gene Maximum The maximum raw data signal values for each gene Standard Error The standard error of the raw data values for each gene Standard Deviation The standard deviation the square root of the variance of the raw data values for each gene Control Value Average The mean of any control value replicates in the experiment Minimum tThe minimum control value signal values for each gene Maximum tThe maximum control value signal values for each gene Standard Error The standard error of the control values for each gene Standard Deviation The standard deviation the square root of the variance of the control values for each gene Annotations Map Position
19. ory and Free Memory listed in the System Monitor window and contact the Silicon Genet ics Technical Services Department at 1 866 SIG SOFT or support sigenetics com Updating GeneSpring To update an existing GeneSpring installation select Help gt Update GeneSpring and follow the on screen instructions to obtain the current GeneSpring jar Learning To Use GeneSpring The Help Menu The Help Menu is located on the right of the menu bar Tutorial This command opens your default browser and takes you to the GeneSpring Basics Instructional Manual in PDF format You can save this file to your local machine and print it The tutorial covers many basic topics of GeneSpring User Manual Select the User Manual command to open the manual installed on your hard drive during installation or updating The GeneSpring User Manual is a PDF document you can save or print Version Notes Select this option to view notes for your version of GeneSpring These are located in C Program Files SiliconGenetics GeneSpring docs Version Notes html Update GeneSpring Select this option to download the latest version of GeneSpring You must have an active license key to update your software You can also automatically update the manuals that accompany GeneSpring The manuals are published in HTML or PDF formats It is important to download updated documenta tion when you update the GeneSpring software Technical Support Select this option to con
20. toggles between the default view and splitting the graph by component Show Bar Line Graph toggles between the bar and line graph views Change Colors allows you to change the colors used to display the components Save Scores save gene lists whose associated values are the component scores for each gene Save Profiles save the shape of each principal component as an expression profile Clustering and Characterizing Data 7 17 Principal Components Analysis PCA on Conditions To run PCA on conditions 1 Select Tools gt Principal Components Analysis Principal Component Analysis EHO Simplified Gene Ontolc PCA on Genes PCA on Conditions 8 all genes H3 all genomic elements 3 ACGCGT in all ORFs i east cell cycle time series no 90 min All Samples Exclude Conditions like YMR199W CLN1 Iv Report scores as correlations all genomic elements 7 216 genes E C3 Experiments EMT oast cell cycle time Computation Preferences Loss Default Interpretatio Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 7 12 PCA on Conditions tab If it is not already selected click the PCA on Conditions tab Select a gene list from the navigator and click Set Gene List Select an experiment from the navigator and click Set Conditions oe em DS Click Exclude Conditions to specif
21. 104 Sample Attribu P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R P 3a STRESS R Jispla Close Help Figure 4 5 The Search GeNet Results window D CD CD CD CD CD CD CD CD CD CD CD CD CD To resize columns in this window click between the column headers and drag to the desired position Click the Configure Columns button to select which columns to display To view details on a particular item in the search results select its row in the table and click the Display button Selecting Genes Frequently you must select a gene or group of genes in order to identify gene names or quickly access genes you are working with Selecting a Single Gene 1 Click once on any line or square representing a gene The name of this selected gene appears in the legend at the bottom of the genome browser 2 Double click a gene to bring up the Gene Inspector window see The Gene Inspector on page 4 10 or use Ctrl I for a selected gene This works on genes represented graphically in the genome browser and on gene names found in lists It is much easier to select a gene in the genome browser if you zoom in on it 4 8 Viewing Data Finding and Selecting Genes Selecting Multiple Genes Click once on any line or square representing a gene Hold down Shift to add more genes Clicking a
22. 117 genes Gene tree of like YMR199W CLN1 0 95 based on interpretation s condition time 10 minutes interpretation Yeast cell cycle time series no 90 min Default Interpretation mode Log of ratio weight 1 0 Similarity Measure was Distance EH Gene Trees 2 S S n o o I m o Figure 7 3 The Name New Gene Tree window To save the new gene tree 1 Enter a name in the Name field at the top of the screen Names may not exceed 80 characters 2 From the navigator select a folder in which to save the new gene tree To create a new folder navigate to the desired parent folder and enter a new folder name in the Folder field 3 Enter any additional information in the Notes field if desired 4 Click Save Condition Tree Complex trees can be made from multiple conditions or by tightly defining the types of data to use Select a gene list in the navigator to reduce the number of genes to be made into a tree Condition Tree Options The following options are available for condition tree clustering 7 6 Clustering and Characterizing Data Clustering Methods Similarity Measure available options are Standard Correlation Smooth Correlation Change Correlation Upregulated Correlation Pearson Correlation Spearman Correlation Spearman Confidence Two sided Spearman Confidence Distance For more information on measures of similarity see Equations for Correlations an
23. ANOVA If an ANOVA was performed for comparing group means X and X compute SE pues if groups are of equal sizes n SE WMS 1 1 if unequal sizes 2 n Ny Where n na and nj are the corresponding group sizes number of samples and WMS is the within group mean square from the ANOVA calculations X X SE Then compute g Analyzing Data 6 39 Statistical Analysis ANOVA Compare this value to the critical value q4 gf where dfis the error degrees of freedom from the ANOVA calculation and amp is the total number of groups If q is larger consider the two means significantly different If any of the group sizes for the gene are i e there are not replicate samples for every group do not perform post hoc tests for that gene Nonparametric Test Kruskal Wallis If the non parametric option was chosen GeneSpring performs a non parametric Tukey test SE n nk nk 1 12 where n is group size and k is the number of groups Rank order all the data and compute rank sums for each group Order the rank sums and compute q as before using the rank sums instead of means Compare q to the critical value Qa o k Student Newman Keuls All calculations here are the same as for Tukey The only difference is in which critical value to compare q When testing group a vs group b compare q to qq df p Where p is the number of means inclusive in the range being tested For example if comparin
24. Attributes n a Usage Organization Cures R Us lt Organization gt Notes Required for Header GenomeNames This setting allows you to associate samples with genomes One database may have sev eral genomes Within this element there must be at least one lt GenomeMappingSpec gt tag Contents lt GenomeMappingSpec gt Attributes n a Usage lt GenomeNames gt lt GenomeNames gt Notes Required can appear only once in a configuration file lt GenomeMappingSpec gt This element specifies the name of the genome in your sample source the target genome on GeneSpring to upload it into and the base directory to use for lt Get SampleIDs gt or GetFile elements This tag must appear at least once However if the useGenome Name attribute is set to false in the lt DatabaseQuery gt tag for GetSampleIDs this tag must appear only once The attribute values for this tag are as follows targetName The genome on GeneSpring into which the samples will be uploaded sourceName The name of the database or directory from which to retrieve sam ples baseDirectory The directory to use for GetSampleIDs or lt GetFile gt ele ments if no data directory is specified Contents n a Attributes targetName sourceName baseDirectory Usage Installing from a Database A 13 Connecting your Database to GeneSpring lt GenomeMappingSpec targetName Yeast sourceName YeastDB baseDirectory
25. Click the Exclude Data button to view a summary of all the data to be uploaded From this screen you can select data objects to be excluded from the bulk upload Only checked items are uploaded To exclude a file or data object uncheck the box next to it in the table Electronic Signatures If you are interacting with a regulatory compliant GeNet server you must provide an elec tronic signature each time you upload a data object to GeNet or delete an object from a GeNet folder When you perform an action requiring a signature the Electronic Signature dialog appears Exporting GeneSpring Data 9 14 Publishing Data to GeNet Electronic Signature This operation requires an electronic signature Indicate your approval by entering your password below Operation Upload the Gene List Normalized Data between 0 975 and 1 002 Login Name moreau Full Name Paul Moreau Password eem Reason for Change Interesting genes for creating mutant yeast human hybrid for the purpose of taking over the world optional pap n OK Cancel Help Figure 9 7 The Electronic Signature dialog Your GeNet password serves as your electronic signature Enter your password in the field provided Optionally you may enter a brief text message describing the purpose of your action For additional information about electronic signatures and other regulatory compliance features in GeNet see the GeNet Administration Manual or the GeNet User
26. Deletes a pathway A confirmation dialog box appears Export as Zip Allows you to export the pathway as a GeneSpring zip file Regulatory Sequences The Find Potential Regulatory Sequence window allows you to find common regulatory sequences within genes in a gene list or to search for a known sequence It also compares the frequency of occurrence against all other gene lists in the genome This feature is useful for finding genes sharing similar regulatory sequences or having a particular regulatory sequence in common When the regulatory sequences tool compares genes to the remainder of the genome it uses the all genes list The all genomic elements list includes non gene elements that are not expressed In GeneSpring version 4 0 and later the sequence information is loaded automatically Analyzing Data 6 18 Working with Gene Lists Note To change the load automatically feature select Edit gt Preferences gt Data Files and uncheck the Load Sequence box Find Potential Regulatory Sequences Executed in 33 seconds E Gene Lists Find New Sequence Enter a Specific Sequence Search results 2 683 PCA Yeast cell cycle time Se V ected Details gt gt CIPR keywords senes Jesam C simplified Gene Ontology Number of Gens 6 127 6 127 have sequence 5231 6127 0326 17 E all genes 5036 5127 1 331e 243 Search Cyferia amp all genomic elements Seftch before ORFs 5235 612
27. Edit Gene List Opens the Gene List Editor window For more information see Cre ating and Editing Gene Lists on page 6 2 The Similar Lists Tab This tab displays names of lists resembling the selected list or containing a statistically significant number of overlapping genes There are two ways to view these lists List View Displays a simple two column list In this view statistical significance is listed as the p value for each of the similar lists Navigator View Displays a navigator style listing Right click a list to print or copy Double click a list to view a Gene List Inspector win dow for that list The Associated Files Tab This screen lists any files that may be associated with the selected gene list including data files array images sample images etc From this screen you can do the following Add File To add a file click Add File and select the desired file from the Browse menu that appears You can also drag and drop a file directly from the desktop into the Associated Files list Extract File To save extract a file in the list to another location select the desired file from the list and click Extract File Choose a location from the Browse menu and click Save This does not remove the file from your list It simply places a copy of the file in a new location Delete File To remove an associated file select it in the list and click Delete View File Select a file name
28. IE ACGCGT in all ORFs like YMR1 99 CLN1 0 HA Experiments Gene Trees HC Condition Trees Expression Classifications HA Pathways HA Array Layouts Expression Profiles H External Programs Bookmarks Scripts Colored by Yeast cell cycle time series no 90 min Default Interpretation Gene List like YMR199W CLN1 0 85 117 Show All Genes Z t Z ully Ou Magnification 1 Figure 7 16 PCA in the Ordered List view References for Principal Components Analysis Alter O Brown P O Botstein D Singular value decomposition for genome wide expres sion data processing and modeling PNAS 97 10101 6 2000 http www pnas org cgi content full 97 18 10101 Cooley W W and Lohnes P R Multivariate Data Analysis John Wiley amp Sons Inc New York 1971 Gnanadesikan R Methods for Statistical Data Analysis of Multivariate Observations John Wiley amp Sons Inc New York 1977 Neal S Holter et al Fundamental patterns underlying gene expression profiles Simplicity from complexity PNAS 97 8409 2000 http www pnas org cgi content abstract 97 15 8409 Hotelling H Analysis of a Complex of Statistical Variables into Principal Components Journal of Educational Psychology 24 417 441 498 520 1933 Kshirsagar A M Multivariate Analysis Marcel Dekker Inc New York 1972 Mardia K V Kent J T and Bibby J M Multivariate Analysis Academic Press London 1979 7 22 Clustering an
29. Mergen Rat01_annotations_cache tmp 4 12 o gt Ppa RME BEE Add All gt gt Directories Rp e Previous Next Cancel Help Figure 3 2 The Select Files window From this screen you can select more files of the same type to add to your experiment To select files a Using the Drives and Directories menus navigate to the folder containing the files you want to add When you select a folder the files it contains are listed in the Files section of the window In the Files section select the file or files to be added To select multiple files hold down the Ctrl key while clicking on the file names You can use the same method to unselect one or more selected files Click Add The selected files are now listed in the Selected Files list To add all files in the selected directory at once without selecting them individually click Add All To remove a file or files from the Selected Files list select them and click Remove To remove all files from the list click Remove A11 Note These files must all be in the same format GeneSpring verifies whether the format d is correct and if it is not it does not add the files to your experiment When you are done adding files click Next If your signal and control files are in separate files the Select Corresponding Files win dow appears Proceed to Step 6 If you have selected samples with multiple files subchips sel
30. No output to Gene Spring Send Experiment to One experi Directory Publishes an experiment interpretation to Directory in GeNet ment interpreta a chosen directory in GeNet No output to tion GeneSpring Send Condition Tree One condition Directory Publishes an condition tree to a chosen to Directory in GeNet tree directory in GeNet No output to Gene Spring Send Gene List to One gene list Directory Publishes a gene list to a chosen direc Directory in GeNet tory in GeNet No output to GeneSpring Send Gene Tree to One gene tree Directory Publishes a gene tree to a chosen direc Directory in GeNet tory in GeNet No output to GeneSpring Look Up Open Scripts gt Basic Scripts gt Look Up Name Input Description Is Gene in Gene One gene and one Return true if the gene list contains the input gene List gene list Number Associ One Gene and one Return the number 0 if none associated with a ated with Gene in condition gene in a condition There is a knob for Type Condition Number Associ At least one gene Return the number 0 if none associated with a ated with Genein and one gene list gene in a condition Gene List Merge Split Groups Open Scripts gt Basic Scripts gt Merge Split Groups Name Input Description Merge Genes One gene group Outputs a list containing all lists in the group input Merge Genes and One gene group Outputs a list of genes Numbers and one number group
31. Once you have set up a genome you are ready to begin loading samples and building experiments Key information that these Genome files include alist of annotations what the scientific community knows about each gene including information used for building ontologies e alist of gene hypertext links URLs from which you can find more informa tion about each gene from public databases mapping information about where each gene appears on a given chromosome identifiers accession numbers for the genes in various public databases Creating Genomes 2 1 The New Genome Installation Wizard The New Genome Installation Wizard The Genome Wizard guides you through the steps of creating a new GeneSpring genome Most of these screens are self explanatory Which screens you see as you proceed through the Genome Wizard vary depending on the information you provide 1 amp x Welcome to the Gene Spring Genome Installation Wizard o o Select File gt New Genome Installation Wizard The New Genome Installation Wizard window appears New Genome Installation Wizard Welcome to the GeneSpring Genome Installation Wizard Genome Data Directory Please enter the common name of the organism under study This is the name that will appear in New Genome selection of GeneSpring s File menu Organism name R Pipiens Overall Genome Properties GenBank Data Files Master Gene Table Genome Sequence File Addition
32. Select a measure of confidence from the Measure of Confidence menu Available options are t test p value Number of Replicates For details on t test p values see The Data Table on page 4 11 Select a multiple testing correction from the Choose Multiple Testing Correction menu Available options are Bonferroni Bonferroni step down Holm Benjamini and Hochberg False Discovery Rate None Specify the following values for the filter Minimum the smallest gene value to allow in your list also known as the cut off value e Maximum the largest gene value to allow in your list Analyzing Data 6 59 Basic Filters Values must appear in at least out of conditions the number of conditions in the total experiment where genes must meet the specified requirements This line can refer to the whole experiment Filter on Flags GeneSpring allows you to find genes based on the data quality flags in the original data files This option is available only if a flag column was specified in the data file when it was loaded into GeneSpring Filter on Flags SEE H Experiments eese aR Demonstration Experime Select an experiment interpretation condition or sample and click on Choose Sample s Choose Sample s gt 7 samples selected from Demonstration Experiment v Interactive Update IMDM Leibovitz s L 15 RPMI 1640 Wyss ZC Normalized Intensity
33. Split Classification One classification Splits the classification into a group of gene lists 8 24 Scripts and External Programs Script Building Blocks Name Input Description Split Conditions One experiment interpretation Splits the experiment interpretation into a group of conditions Split Gene List One gene list Splits the gene list into a collection of individual genes Split Gene List With Numbers At least one gene list Splits the gene list into a group of genes and an associated group of numbers Make Groups Open Scripts gt Basic Scripts Groups gt Merge Split Groups gt Make Name Input Description Make Classifica tion Group A folder of classifi cations Produces a group of classifications Make Experiment Group A folder of experi ments Outputs a group of interpretations There is one knob to select whether to include all the interpreta tions or only the defaults Make Gene List Group A folder of gene lists Outputs a group of gene lists Make Gene Tree Group A folder of gene trees Outputs a group of gene trees Select Groups Open Scripts gt Basic Scripts Groups gt Merge Split Groups gt Select Name Input Description Filter Boolean Group Two Boolean groups If true pass the second argument for each Boolean through the corresponding
34. The bottom half of the Classification Inspector contains a table with three columns Class the name given to each class Genes the number of genes in each class Average Radius the root mean square of the Euclidean distances between each gene and the centroid of each class Classes with large radii are spread out and classes with small radii are tightly grouped Percent Explained Variability In the Classification Inspector there is an improved formula for calculating percentage explained variability This new formula properly weights classes by the number of genes in each class Let c be the number of classes including unclassified but not no data n be the total number of genes with data nj be the number of genes with data in class i D be the distance of each class centroid from the overall data centroid dj be the distance of each gene from the centroid of its class Viewing Data 4 23 Inspectors Calculate 6 B Yn i l c Ny WSL i lj 1 W n c E 100 x ma 1 eran 0 If c2 n then E 0 E is the percent variability explained Note that the percent explained variability depends on the selected experiment and the selected gene list It is calculated using Euclidean distance of the gene expression profiles of the conditions interpreted in the interpretation made ratio log fold Details about the number of genes in each class matching the selected gene list and the number of thos
35. These con ditions are listed in the far right side of the tree view If one of the parameters has been designated as a continuous parameter it is shown directly beneath the genome browser Tree Display Options The following display options are available in tree view Gene Tree Condition Tree The available options for this view are listed below Features The available options for this view are listed below Coloring See Color on page 4 30 Legend See Legend on page 4 28 To modify the appearance of your tree select View Display Options Gene Tree Condition Tree Tab The Display Options window includes a Gene Tree or Condition Tree tab depending on whether you have selected a gene tree condition tree or both This tab contains the fol lowing options Draw Genes Horizontally Orients your tree so that the genes appear as horizontal bars on the right extending from tree branches on the left Show Tree Structure Specifies whether to show or hide the tree structure Show Gene Name Labels If genes are displayed vertically shows the name of each gene to its right if there is space You must be at a very high magnification for these labels to be visible This option is available only if a gene tree is selected Show Tree Annotation Labels Displays annotations for tree nodes if they are avail able and space permits This option is available only if a gene tree is selected Show E
36. Tree A hierarchical tree in XML format Genome An XML representation of the genome which will be automatically saved to disk during Import to GeneSpring Experiment Condition Statistics One gene per line one column per statistical quantity per condition with labels for the statistics across the top in the form N R C AVERAGE MIN MAX STDERR STDDEV N The Values Included in Input panel is active only if you selected the Experiment Condition Statistics radio button Click Show Example for an example of the selected output Examples can be viewed as plain text hex code or as a spreadsheet Non displayable characters such as tabs are displayed as boxes 3 Click OK to return to the New External Program window If desired check the Debug Input box This option writes to the console window when input is sent from GeneSpring to the external program To edit an existing input type select it in the list box and click Edit Input To remove an input type select it in the list box and click Remove Input The Outputs Tab The output is what GeneSpring receives from the external program On this tab specify the desired output of the external program You may add as many outputs as you like If the external program does not send any data back to GeneSpring you do not need to enter anything in this tab To add an external program output 1 Click Add Output The Choose Type of Output window appears Scripts and External Prog
37. Using the ScriptEditor Using the ScriptEditor To open the script editor select Tools gt ScriptEditor ScriptEditor Concepts Script building block Performs a simple function This is the most basic element of a script Script A more complex program made up of script building blocks and or scripts and external program building blocks Building block One of the building blocks used to create scripts i e a script primi tive script or external program building block External program building block A building block that is created for each external program defined within GeneSpring A script with an external program building block can only be run on a version of GeneSpring with that external program installed For more information on GeneSpring s External Program Interface see External Pro grams on page 8 35 Socket Parts of a building block that send or receive a connection from another building block Inputs and outputs are sockets Building block input Parts of a building block that receive information Inputs are located on the top of a building block and receive connections only from outputs Building block output Parts of a building block that send the results of the operation performed Outputs are located on the bottom of a building block and receive connec tions only from inputs Script input The socket at the top of a script Script inputs are created when you drag a building block input i
38. comparisons field Analyzing Data 6 57 Basic Filters Filter on Error Ee Filter on Error FC Gene Lists Ho EC Choose Gene List gt gt like YMR199W CLN1 0 95 HCJ PCA Yeast cell cycle time Choose Experiment gt gt Yeast cell cycle time series no 90 min Default Interpretation HOJ PIR keywords C Simplified Gene Ontology 8 all genes Choose Error Type Standard Deviation X all genomic elements Hi ACGCGT in all ORFs Filter Genes on Standard Deviation Lg like YMR1 S 117 out of 117 genes pass filter v Interactive Update Cross Gene Error Modelis Active l e M Normalized Intensity log scale Standard Deviation Minimum 0 0235 Maximum 3 493 View Graph Values must appear in atleast 8 out of 16 conditions Save Close Help Figure 6 15 The Filter on Error window To filter on errors 1 Select an experiment or condition from the navigator and click Set Experiment You can also select a subset of conditions within an experiment 2 Select the error type to filter on Available options are Standard Deviation Standard Error Range of Replicates 3 Specify the following values for the filter Minimum the smallest gene value to allow in your list also known as the cut off value e Maximum the largest gene value to allow in your list Values m
39. data x05409 RNA for mitochondrial aldehyde dehydro ine 33 data D83785 mRNA for KIAA0200 gene line 34 data X98834 mRNA for zinc finger protein Hsal2 ine 35 data M12783 C sisiplatelet derived growth factor 2 line 36 data U40317 Protein tyrosine phosphatase PTPsigmz line 37 data D42073 mRNA for reticulocalbin line 38 data M90820 Rapamycin binding protein FKBP25 line 39 data X85545 mRNA for protein kinase PKX1 ine 40 data L36151 Phosphatidylinositol 4 kinase inansa EE eT ainia eiu Gene Identifier Column Titles Gene Identifier Column 1 I Has Column Titles Match Gene Identifier To Systematic Name or Common Name or Synonym xl Line of Column Titles 30 4 Search Criteria Column values mustbe Equalto v Iv Use as a Wildcard Value must appear in Atleast son no columns selected Close Help Figure 6 20 The Arbitrary File Restrictions window 1 Click Choose File and select the desired file from the browse menu This file must have a column of gene identifiers During the loading process Gene Spring analyzes the file to determine which column contains the Gene Identifier and colors that column in blue In addition it attempts to guess whether the file has column titles and colors that row red 2 If GeneSpring did not select the correct column for the gene identifier specify it in the Column Containing Gene Identifier field 3 Use the Match Gene Identifier To menu to
40. gt y gt y gt y gt y gt E y gt y gt y gt y gt y gt y gt y gt y gt y gt yo y gt y gt y gt y gt yf Disease 4no hepatitis gt hepatitis gt syphili osteoporosis E arthritis gt cancer cancer canc cancer arthritis gt arthritis gt arthritis gt arthritis gt hepawitis gt hepatitis gt hepatitis hepatitis gt hepatitis gt hepatitis hepatitis hepatitis gt osteoporos osteoporosis osteoporosis gt os geoporosis syphilis gt syph syphilis gt syphilisd Infectious Disease n wy gt n gt n gt n gt n n n gt gt n gt n gt n gt n gt wy gt gt y gt y gt y gt y gt y gt gt n gt n gt n gt y gt y gt Hepatitis gt n gt y gt wy gt n n n n TH E n gt n gt n gt y gt wy gt gt y gt y gt y gt y gt n gt n gt n gt n gt n gt n gt Type Hepatitis gt n gt a gt gt n gt n gt n gt n n gt n gt gt n gt n gt n gt n gt b b gt b gt a gt a gt a gt a gt n n n no n nj Cancer gt n gt n gt n gt n n y y gt YR y o y gt y gt y gt y gt y gt gt y gt y gt y gt y gt y gt yo y gt y gt y y gt Type Cancer 7 n n n n n gt braintbreast kidn gt livertbraintbreast kidney livertbraintbreast kidney m livertbraintbreast kidney livertbraintbreast gt kidney m
41. interpretation larity measure Maximum itera tions Addi tional tries and Discard bad Name Input Knobs Description K means One gene list Number of Outputs a k means classification one experiment groups Simi K means with One gene list Similarity mea Outputs a k means clustering starting Starting Classifi one experiment sure Number from an existing classification cation interpretation of iterations and one classifi and Discard cation bad Self Organizing One gene list Iterations Dis Outputs self organizing map Map one experiment card bad interpretation Rows Col umns and Radius Correlations Open Scripts gt Basic Scripts gt QC amp Analysis gt Correlations custom input more experiment interpretations two numeric val correlation Maximum cor relation Condi Name Input Knobs Description Condition Cor Two conditions Correlation Compare two conditions looking at relation one gene list only those genes in the specified gene list and output a p value Find Similar One condition Correlation Compare the specified condition to Conditions in one experiment Cut off value every condition in the specified exper Experiment one gene list iment using only the genes in the specified gene list Output an array of conditions and an associated array of p values Find Similar One gene one Similarity mea Outputs a list of genes whose exp
42. log scale Flag Value Present or Marginal View Graph E Value must appear in atleast 1 out of 7 samples Save Close Help Figure 6 17 The Filter on Flags window 1 To use all the samples from an experiment select an experiment from the navigator and click Choose Samples To select individual samples from the selected experiment click Add Remove The Samples to Filter window appears Analyzing Data 6 60 Basic Filters Samples to Filter Filter on Parameter Filter on Keyword Filter Results 7 Samples E Filter on Attributes Filter on Experiment Show All Demonstration bt Type Your Name Aug 8 2002 6 11 Demonstration bxt Type Your Name Aug 8 2002 6 11 Demonstration bt Type Your Name Aug 8 2002 8 117 Demonstration bt Type Your Name Aug 8 2002 6 11 Namane tration tt Tuna Vni Mama s C f L Add All R Remove an Configure Columns All Samples are displayed Samples to Filter 1 Samples OK Cancel Help Figure 6 18 The Samples to Filter window This window behaves exactly like the Sample Manager window For more information see The Sample Manager on page 3 23 2 Select the desired flag value from the pull down menu The available options are Anything Present Present Marginal Present Unknown Present Marginal Unknown Marginal Absent Unknown 3 Enter a value inthe Difference must appear in at least out of samples
43. the result of the calculation for genes A and B n the number of samples being correlated over e a the vector aj a 43 ay of expression values for gene A b the vector by b gt 63 b of expression values for gene B Normal mathematical notation for vectors are used In particular a b a b Fab has tay by e a square root a a B 2 Equations for Correlations and other Similarity Measures Common Correlations Common Correlations Standard Correlation Standard correlation measures the angular separation of expression vectors for Genes A and B around zero As almost all normalized values for genes are positive you find mostly positive correlations between genes when you use the Standard correlation This metric is designed to answers the question do the peaks match up or to put it another way are the two genes expressed in the same samples Since these questions are the most frequent questions a biologist is trying to get answered GeneSpring calls it Standard correlation It is important to note what mathematicians and statisticians refer to as correlation usu ally refers to the Pearson correlation The Standard correlation would be called Pearson correlation around zero by mathematicians and statisticians This is how to compute a Standard correlation Standard correlation a b a b Or in summation notation Figure 1 1 Summation Notation for the Standa
44. 1 1t 11 YALDO2W 1 738316 0 971963 0 570093 0 635514 1 0 12 YALDOSW 0 710966 0 66773 0 964863 0 579229 1 1 Figure 3 11 Example of parameter arrangements and values Name The first line must be the unique name of the experiment Parameters The second line must be the first parameter You can have an unlimited number of param eters The first column must contain the parameter name Subsequent columns contain values for the parameter in that sample Each parameter must have units in parentheses in the same column as the name For example the parameter time should be immediately followed by minutes If your Working With Experiments 3 17 Copying and Pasting Experiments parameters have no units you must follow the name with an empty set of parentheses or GeneSpring does not recognize it as a parameter By default GeneSpring assumes that the parametric values to follow are numeric and to be displayed in numeric order If the parametric values for a parameter are non numeric enter an asterisk immediately after the unit indicating parentheses empty if no units There must be a space between the right parenthesis and the asterisk This tells GeneSpring to expect non numeric parametric values and treat the data appropri ately The default setting for interpretation of parameters is as a continuous element See Continuous Element on page 3 31 for details To have the parameters treated differ ently enter
45. A gene s mapping information Chromosome The chromosome on which a gene is located if known User Notes Any additional notes you may have associated with a gene EC A gene s EC Enzyme Commission number if known Description A gene s description if known Exporting GeneSpring Data 9 7 Exporting Gene Lists e Product The protein product coded for by a gene if known Phenotype A description of a gene s phenotype if known Function A description of the function of a gene s product if known Keywords Keywords associated with a gene if known PubMed ID A gene s PubMed identifier Custom Field 1 Custom Field 2 Custom Field 3 Any information you choose to place here for your own use Type tThe feature type from the GenBank file DB id A reference used to identify a gene within GeNet GO Biological Process The Gene Ontology Biological Process classification GO Molecular Function The Gene Ontology Molecular Function classification e GO Cellular Component The Gene Ontology Cellular Component classification RefSeq The gene s NCBI Reference Sequence project identifier UniGene The gene s UniGene cluster identifier Exporting MAGE ML Data MAGE ML Microarray Gene Expression Markup Language is a markup language based on XML and designed to describe and communicate information about microarray experi ments Using MAGE ML your experimental data can include information a
46. Click Start to begin updating annotations While the GeneSpider runs there are a number of informational fields visible Status this is the level of completion the GeneSpider has reached Processed Number of genes in the genome that the GeneSpider has finished query ing the database on Found number of processed genes where the GeneSpider has found a useful record in the database Enhanced Number of genes where information has been found that was not on the master gene table Go To Number of genes in the genome that have not been processed The available options are slightly different when updating from Silicon Genetics The master gene table is not updated until you click Save and Close This button is inactive while the GeneSpider is running You must wait until the GeneSpider is finished or click the stop button before clicking Save and Close The GeneSpider Errors Window While the GeneSpider is running the GeneSpider Errors window may appear This win dow lists any errors the GeneSpider encountered and a brief description of the problem For example if no match was found for some genes on your system the Errors window displays the gene identifier and the text Gene not found The most common reason for no genes to be updated is that you did not select the annota tion column containing the GenBank Accession Numbers Problems with the Map Location Annotations This window appears when the information the
47. Correlation The Spearman correlation is a nonparametric correlation similar to the Pearson correlation except it replaces the data for Gene A and B with the ranks of the data i e the lowest mea surement for a gene becomes 1 the second lowest 2 and so forth Spearman correlation calculates the correlation of the ranks for Genes A and B s expression data around the mean ofthe ranks using the same formula as Pearson correlation In the Spearman corre lation only the order of the data are important not the level therefore extreme variations in expression values have less control over the correlation If there are ties in the data then all of the tied values are assigned the average of the ranks e g if the 5th 6th and 7th low est values are tied all three datapoints are assigned a rank of 6 To compute a Spearman correlation 1 Order all the elements of vector a 2 Use this order to assign a rank to each element of a 3 Make a new vector a where the i element in a is the rank of a in a 4 Now make a vector A from a in the same way as A was made from a in the Pearson Correlation 5 Similarly make a vector B from b Spearman correlation A B A B Spearman Confidence Spearman confidence is a measure of similarity not a correlation Spearman confidence is one minus the p value for the statistical test that the Spearman correlation is zero versus the alternative that is larger than zero There is a high Spearman c
48. Dealing with Repeated Measurements on page 5 18 for more information The only difference is that averaging of repeated parame ters is done after the raw data has been normalized Continuous Element In a continuous variable each parameter value exists in series on a continuum with the other values in that parameter rather than as discrete points Each value is related to the values on either side of it and adjacent data points are connected together by lines Typi cally continuous variables are numeric This requires that the values be in a particular order GeneSpring automatically orders numerical parameters from highest to lowest and non numerical parameters in alphabetical order When graphing by a continuous parameter each value is placed on the X axis in order from left to right You can change this default order See Set Value Order on page 3 34 for more details Non Continuous Element In a non continuous or set variable each parameter value exists independent of the oth ers as a discrete point When a non continuous element is graphed each parameter value is placed on the horizontal axis in order from left to right GeneSpring automatically orders numerical parameters from highest to lowest Non numerical parameters are in alphabetical order See Set Value Order on page 3 34 if you need non numerical param eter values to be graphed in a particular non alphabetical order When displaying data from a non conti
49. Error Model These numbers are the standard deviation of the measured signal around the true expression level sig nal for that sample as expressed by the scanner software See Cross gene Error Models on page 3 44 for details Control Channel Optional Any If you have control channels i e a two color experiment you must have the same number of control channel columns as sig nal columns Control Channel Background Optional Any If you are using control channel back grounds the number of columns must be the same as the number of Control Channel columns Description Optional One A description of the gene if known This information is included in the new master table of genes and is accessible with the Find Gene command and the Gene Inspec tor This field applies only to new genomes cre ated through the Column Editor 3 10 Working With Experiments Using the Column Editor Name Required Allowed Description GenBankID Optional One The GenBank identifier for the gene if known If the GenBank identifiers for your genes are not used as their systematic or common names including the GenBank accession number in this field allows you to update information about the gene directly from GenBank See Updating Annotations with GeneSpider on page 6 27 for more information This field is included in the new master table of genes and applies only
50. External Programs Colored by time 0 minutes Gene List like YMR199W CLN1 0 95 117 4 D AID 1 1 1 1 1 1 LJ I 1 1 1 1 1 1 1 Ei Show All Genes Zoom Out Zoom Fully Out Horiz mag 2 3 Vert mag 2 3 Figure 4 32 The Array view H Bookmarks H Scripts In Figure 4 32 each solid circle represents an oligonucleotide on the array If you zoom in the gene names become visible Circles are numbered from left to right and top to bottom For example a 3X3 array is 123 456 789 Viewing an Array Layout 1 Select the View gt Array Layout option 2 Select an array from the navigator Array Layout Display Options The following display options are available in graph view Features The available options for this view are listed below Coloring See Color on page 4 30 Legend See Legend on page 4 28 4 60 Viewing Data Array Layout View The only feature that can be changed is the Show unclassified Group When Splitting the Window option within the features panel When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Viewing Data 4 61 Pathway View Pathway View The Pathway view lets you display and place genes on an imported gif or jpeg image For information on downloading and importing pathways see Pathways on page 6 16 amp Full
51. File Save the list of samples to a text file e Create New Experiment Create a new experiment from selected samples For details see Creating New Experiments on page 3 16 Clustering and Characterizing Data 7 27 Find Similar Samples 7 28 Clustering and Characterizing Data 8 Scripts and External Programs Scripts and External Programs 8 1 Scripts Scripts What is a Script Scripts are time saving tools allowing a long series of data analysis steps to be performed at once Scripts are re usable and can be applied to any data set You can create your own scripts using the ScriptEditor Scripts in GeneSpring Eleven predefined scripts are included with GeneSpring 2 fold Expression Change This script makes a gene list of all genes in a selected experiment that are 2 fold overexpressed or 2 fold underexpressed in at least one con dition 2 fold Expression Change AND Filter on Noise NOT Input This script combines the 2 fold Expression Change List and Filter on Noise scripts to produce a single gene list that passes both filters but does not have any genes on the input gene list Best k means Given an experiment and a gene list this script creates four k means classifications with three five eight and 15 clusters respectively and selects the classi fication with the highest explained variability The selected k means appears in a results window Clustering 2 fold Change List This script creates a
52. Filter Results table Analyzing Data 6 5 Working with Gene Lists Working with Gene Lists The Find Similar Command Similar lists in the Gene List Inspector window are gene lists that contain a significant number of overlapping genes with the one selected The p value is calculated using the hypergeometric probability This equation calculates the probability of overlap corre sponding to k or more genes between a gene list of n genes compared against a gene list of m genes when randomly sampled from a universe of u genes zoo The standard list checkbox in the Gene List Inspector window allows you to define a newly created list as standard list If a list is defined as standard the list is included in the search for similar lists Some lists such as those created using the Simplified Gene Ontology tool are automatically defined as standard lists You can also change the GeneSpring Preferences to allow the program to search through all lists in your genome for similar lists Open Edit Preferences select the Mis cellaneous tab and change the settings under Restrict Gene List Searches Each gene expression profile must contain the set minimum correlation to be considered similar The higher you set the minimum correlation maximum 1 the closer the gene expression profiles must be To make a list with the Find Similar command 1 Double click a gene to bring up the Gene Inspector 2 Click Find Similar The New
53. GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists 103 Experiments 1 HOI Gene Trees HC Condition Trees C Classifications The genome browser allows you to visualize your data and analysis results The colorbar AR E E E E E EJ FH FC Pathways 5 l provides a HOJ Array Layouts visual key to HC Expression Profiles 5 4 External Programs 2 1 the current Bookmarks oo coloring H Scripts E scheme S The navigator allows you to select which 0 01 data to work with 10 20 30 40 50 60 70 80 Y axis Yeast cell cycle time series no 90 min Default Interpretation Colored by time 7 648 minutes Gene List all genomic elements 7216 Magnification 1 Drag the slider to move to different The picture area points within your experiment displays images corresponding to the various points in an experiment Figure 1 3 The main GeneSpring window Below are some basic procedures for navigating the GeneSpring interface Changing the genes displayed Open the gene list folder in the navigator GeneSpring initially displays the all genes list You can change the genes shown in the display by choosing another list Views You can change the view in the genome browser using the View menu GeneSpring ini tially displays
54. GeneSpring Yeast Genes like YMR199W CLN1 0 95 Eor File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists 1 HC EC PCA Yeast cell cycle time PIR keywords o 5 t t Simplified Gene Ontology HOJ Array Layouts Expression Profiles HOJ External Programs HC Bookmarks F F F F F 4 HO Scripts X all genes 8 all genomic elements 8 ACGCGT in all ORFs 3 REN Expression HA Experiments Gene Trees Condition Trees HC Classifications Hl Pathways 1 Cell growth and division Lt mitosis Selected Pathway mitosis Colored by time 0 minutes Gene List like YMR199V CLN1 0 95 117 a5 1 1 1 1 1 Show All Genes Magnification 1 Figure 4 33 The Pathway view Viewing a Pathway l Select a pathway from the Pathways folder in the navigator You must have already created a pathway See Pathways on page 6 16 Select a gene list If a pathway contains a gene on a selected gene list then the gene is colored according to its expression level in the selected experiment See the example of the mitosis pathway in Figure 4 33 To add a gene to the pathway hold Ctr1 and drag mouse over the desired placement area Type a gene name or keyword If a keyword is used select the gene from the resulting
55. Ha Expression Profiles External Programs Set 3 1 397 genes 1 397 in list Set 4 1 264 genes 1 264 in list HC Bookmarks 10 HO Scripts Normalized Intensity log scale Calculated time minutes TUE OU Y ee TT ee ee 0 20 40 60 80 100 130 160 Unclassified 1 089 in list Y axis Yeast cell cycle time serie Colored by Calculated Unclassified Splitby 4 cluster K Means for Yea Gene List all genomic elements Z f Zoom ut Magnification 1 Figure 4 15 Example of a k means clustering Viewing Data 4 25 Display Options GeNetViewer Rat Genes all genes DER File Edit View Experiments Colorbar Window Help HO Gene Lists ormalized Intensity ormalized Intensity HO Experiments ginear scale i HO Gene Trees ae HO Condition Trees vo nib a o 7 T ts A 0 7 M H Classifications Adult Embryonic Postnatal Adult Embryonic Postnatal Ha Users neuro glial markers 25 shown neurotransmitter receptors F Classification 2 a Pathways HCJ Bookmarks ormalized Intensity ormalized Intensity linear scale Expression 0 0 0 1115 21 0 0 1115 21 0 7 14 Adult Embryonic postnatal Adult Embryonic Postnatal peptide signalling 27 shown diverse 21 genes 21 shown Graphed by test Default Interpre Colored by test Default Interpret Split b
56. Jun 19 2001 10 2 Silicon Genetics s E Remove Al inspec Configure Columns Selected Samples 1 Samples yeasttimeseries t Itizy 1028 Publish to GeNet Cop Close Help Type Your Name Jun 19 2001 10 2 Silicon Genetics Delete Create Experiment Edit Attributes Figure 3 17 The Filter on Attributes tab This method allows you to filter samples based on their attributes Attributes are very sim ilar to parameters but are associated with individual samples rather than entire experi ments Attributes can also be paragraphs long since they do not appear as labels on a Working With Experiments 3 26 The Sample Manager graph They contain sample specific information that would not be used as a basis for analysis For example the following might be attributes of a sample Patient Biography Lab Technician Date Ambient Temperature Attributes may also contain the same information as parameters and can be imported as parameters when creating an experiment To select an attribute click its name in the Select a Sample Attribute of Interest list To select attribute values click the desired values in the Select Attribute Values list To select all values in the list click Select A11 To clear your selections click Clear All The Filter Results list is updated dynamically as you make your selections Filter on Keyword This method allows you to filter based on whet
57. Local run time estimate A few minutes Start Close View Script Figure 8 1 The Run Script window To execute a script 1 In the navigator open the Scripts folder 2 Select one of the demo scripts The Run Script window appears displaying various ele ments of the selected script 3 Select a data object from the navigator panel and click the appropriate button in the Inputs box The example script in Figure 8 1 requires one data object a condition Some scripts need no input Select a condition in the navigator and click Set Condition gt gt Scripts and External Programs 8 3 Scripts Set any specified knobs In this particular script you must specify a cutoff value In other scripts you might select a value from a pull down menu to direct the execution of the script The number of knobs can vary greatly but they all appear in the Knobs box Scroll down to make sure that all the text boxes are filled in Not all knobs require user input The ScriptEditor does not recognize numbers with spaces or commas Use periods as your decimal markers If your script requires an array of numbers for example the weights associated with a complex correlation a table appears in which you can enter these numbers Enter one number per line You can also paste numbers into this table in tab delimited format from a text or excel file The order of these numbers must match the order of the inputs that they describe Specify w
58. Manual Exporting GeneSpring Data 9 15 Publishing Data to GeNet Exporting GeneSpring Data 9 16 Appendix A Installing from a Database A 1 Installing from a Database Custom Databases and GeneSpring Custom Databases and GeneSpring You can load experiments into GeneSpring from your company s database To do this you must set up a dataloader xml file prior to starting GeneSpring Databases A database is an organized collection of information Essentially it is a collection of records In database terms a record consists of all the useful information you can gather about a particular item Each little bit of information making up a record is called a field An example of a non computerized database would be your address book Each record represents one of your contacts and each record consists of many fields such as name address number and so on Computer databases automatically keep records organized and enable you to search for or pull out particular records based on any field in the record The software allowing you to create and maintain databases is called a Database Management System or DBMS In database terminology a file is called a table Each record in the file is called a row and each field is called a column A relational database is the most common type of database in client server systems Sim ply stated in this type of database relationships are established between tables based on common information
59. Note that if you have any rows selected you must first click Clear Selection Locating a Particular Gene 1 Type Ctrl F 2 Enter the gene name 3 Click OK Inspect Found Gene To bring up the Gene Inspector for your found gene type Ctrl I Viewing Data 4 67 Condition Scatter Plot Condition Scatter Plot The condition scatter plot displays a fundamentally different type of information than any other view with the possible exception of the condition tree Unlike other GeneSpring views each colored point dot circle square etc represents a condition not a gene This view is the most common way to visualize the results of principal components analy sis performed on conditions It is also useful for presenting complex multidimensional data in the context of conditions For example a 3D condition scatter plot can be config ured to display a principal component score on one axis a parameter value an a second axis and the normalized expression level of a given gene on the third axis A simpler pos sibility is to plot the expression values for two genes on two axes Such a plot is useful for demonstrating whether the expression pattern of the genes is correlated or anti correlated amp Condition Scatter Plot for Yeast cell cycle time series no 90 min Default Interpre DER Y YALOO2W VPS8 YAL001C TFC3 z cu D o e 2 a x Lu X axis YALOO1C TFC3 ratio Conditions Yeast cell cycle time series no 90 m
60. Parameter To create a new parameter 1 Click New Parameter A dialog appears 2 To add a standard parameter select it from the pull down menu and click OK To add a custom parameter select the Custom radio button and click OK A new column appears in the Experiment Parameters window If you want to accept the default values for a standard parameter simply click Save 3 Fill in the Parameter Name and Parameter Units in the new column the latter only if applicable 4 In the Numeric and Logarithmic rows select Yes or No from the pull down menus Click in a cell in either row to make the pull down menu appear You can also paste data in the Sample cells 5 Click Save to change the parameters in your current experiment or Save As to save this parameter set up as a new experiment You can paste in columns of information by clicking the cells of the Sample section For example if you had an Excel spreadsheet of data and wanted to copy and paste a column from it you could copy a large section of column and paste it into the new column You can also copy information out You can only add columns parameters and parameter val ues you cannot add rows samples into this table Delete Parameter To delete a parameter click the gray bar above the column you want to delete and click Delete Parameter Replace To replace many entries at once select the entries to change and click Replace Enter the appropriate text in the dialog b
61. Progress Stat Close ShowDetails Help Figure 9 4 The Upload to GeNet window Note If you are uploading to a regulatory compliant GeNet server you are prompted to enter your password in the electronic signature dialog For more information see Electronic Signatures on page 9 14 The upload status box notifies you when the upload is complete If you are having trouble uploading ask your administrator to confirm that you have access to the target directory on GeNet Deleting Data Objects from GeNet You can delete data objects of which you are the owner in GeNet using the GeneSpring interface Right click the object in the GeneSpring navigator and select Delete A con firmation dialog appears Click Yes to continue Note If you are deleting objects from a regulatory compliant GeNet server you are prompted to enter your password in the electronic signature dialog For more infor mation see Electronic Signatures on page 9 14 Uploading Genomes to GeNet The Bulk Upload feature allows you to upload entire genomes and large amounts of data to GeNet at once To perform a bulk upload select File gt Bulk Upload to GeNet Note If you are uploading to a regulatory compliant GeNet server you are prompted to enter your password in the electronic signature dialog For more information see Electronic Signatures on page 9 14 Exporting GeneSpring Data 9 12 Publishing Data to GeNet
62. Right click over the scatter plot and select Display Options 3 Select the first desired gene list from the navigator and assign it to the X axis Repeat this step for the remaining axes 4 When you are done assigning the gene lists to the desired axes click OK PCA on Conditions 1 Open View gt Condition Scatter Plot 2 Right click over the scatter plot and select Display Options 3 Select the first desired expression profile from the navigator and assign it to the X axis Repeat this step for the remaining axes 4 When you are done assigning expression profiles to the desired axes click OK Clustering and Characterizing Data 7 21 Principal Components Analysis Viewing Principal Components in an Ordered List The best way to visualize the genes that exhibit the highest levels of an individual compo nent is to use the ordered list view Select View gt Ordered List and choose one of the PCA gene lists from the naviga tor panel Genes exhibiting the highest levels of the selected principal component are dis played on the left side of the genome browser and have the longest lines extending upward from them For more details see Ordered List View on page 4 58 GeneSpring Yeast Genes like YMR199W CLN1 0 95 File Edit View Experiments Colorbar Filtering Tools Annotations Window Help EHS Gene Lists PCA Yeast cell cycle time HO PIR keywords HHO Simplified Gene Ontology X all genes X all genomic elements
63. Search Check All Clear All Athaliana Cities I H pylori C elegans D melanogaster HU35K ABCD Research Group or Organization Finding and Selecting Genes Hox Contains z Contains z Contains Contains z Contains Contains z Contains z Contains From mw Caulobacter E coli Hu95aVer2 Start Close Help Figure 4 4 The Search GeNet for Data Objects window To search for data objects 1 In the Data Type section specify the type of data to search for 2 Enter search terms in the appropriate fields in the Search Fields section Use the pull down menus to specify how the search term appears in that field Available options are Contains Equals Starts with Ends with Does not contain All entries in the Search Fields section are considered ands i e the search will look for data objects that match all of the terms entered on this screen 3 Specify the genome or genomes in which to search The more genome names you check the longer the search process will take Viewing Data 4 7 Finding and Selecting Genes 4 Click Start Search GeNet Results When the search has completed the results screen appears This screen contains a list of all results matching your query displayed as a columnar table with associated information about the matching genes or data objects Search GeNet Results Data Objects Search Results for tetris
64. Show Example for an example of the selected output Examples can be viewed as plain text hex code or as a spreadsheet Non displayable characters such as tabs are displayed as boxes 3 If the output is a gene list with numbers enter a label to describe what the numbers rep resent in the Description of Numbers field This label appears in the Gene List Inspector as the title of the column of numbers 4 Click OK to return to the New External Program window If desired check the Debug Output box This option writes to the console window when output is sent from the external program to GeneSpring 8 38 Scripts and External Programs External Programs To edit an existing output type select it in the list box and click Edit Output To remove an output type select it in the list box and click Remove Output The Delimiters Tab This tab is necessary only if your external program has multiple inputs or outputs Inputs Outputs Delimiters Arguments Delimiter Between Multiple Inputs or Outputs v Use ASCII 255 as delimiter default m Terminate Last Input to Program with ASCII 255 Figure 8 6 The Delimiters Tab Both GeneSpring and the external program need to know when a new data type is being sent Certain characters are used to indicate this new data type This is usually the ASCII 255 character however your program may require a different delimiter By default GeneSpring uses ASCII 255 as the data type delimiter T
65. Weighting Coefficient box If you do not wish to weight genes based on their control value uncheck the Weight genes based on intensity box 5 Specify whether to run this process locally or on a GeNet Remote Server 6 Click Start When the analysis is complete the Find Similar Samples Results window appears The Find Similar Samples Results Window This window displays the results of the Find Similar Samples operation both as a bar graph ordered by correlation and a list of samples 7 26 Clustering and Characterizing Data Find Similar Samples ro gt Find Similar Samples Results for yeast timeseries txt column 9 Target Sample yeasttimeseries bt column 9 Gene List all genomic elements 7 216 genes Weighting Coefficient 0 25 0 94 0 92 2 0 90 g S 0 87 o 0 85 0 83 0 81 yeasttimeseries t colu yeast timeseries bt colu yeasttimeseries t colu yeasttimeseries tt colu yeast timeseries txt colu yeasttimeseries tt colu yeasttimeseries tt colu yeasttimeseries tt colu yeasttimeseries tt colu yeasttimeseries tt colu yeasttimeseries tt colu yeasttimeseries bt colu yeasttimeseries t colu yeasttimeseries t colu 3 o S 6 2 2 o E i g i 3 Configure Columns Nam Correlation corretationBars Step w Genes Used attributes veasttimeseries t column 8 0 95 IIEEELELCEE ELLE 60 minutes 2 veasttimeseries bt
66. When you are done click OK You are returned to the Experiment Attributes screen A new column appears for each attribute you imported New Attribute To create a new attribute 1 Click New Attribute A dialog appears 2 To add a standard attribute select it from the pull down menu and click OK To add a custom attribute select the Custom radio button and click OK A new column appears in the Experiment Attributes window If you want to accept the default values for a standard attribute simply click Save 3 Fill in the Attribute Name and Attribute Units in the new column the latter only if applicable 4 In the Numeric and Logarithmic rows select Yes or No from the pull down menus Click in a cell in either row to make the pull down menu appear You can also paste data in the Sample cells 5 Click Save to change the attributes in your current experiment or Save As to save this attribute set up as a new experiment You can paste in columns of information by clicking the cells of the Sample section For example if you had an Excel spreadsheet of data and wanted to copy and paste a column from it you could copy a large section of column and paste it into the new column You can also copy information out You can only add columns attributes and attribute values you cannot add rows samples into this table Delete Attribute To delete a attribute click the gray bar above the column you want to delete and click Delete Attri
67. Wilcoxon two sample rank test also known as the Mann Whitney U test for two groups and the Kruskal Wal lis test for multiple groups This test is most successful if you have more than five replicate samples in each group 3 Select a P value cutoff for genes that pass the filter P values are the probability of a false positive indicated by a number between zero and one 4 Select a type of multiple testing correction The available options are described below Analyzing Data 6 34 Statistical Analysis ANOVA 5 Select a type of post hoc test to perform if desired Post hoc tests are described in more detail below Selecting Groups Manually To make groups that are defined by two or more parameters or to make groups that corre spond to a subset of the values for a given parameter click Select Groups Manu ally Select Groups for 1 Way Tests Selectthe groups of interest To ignore a parameter uncheck its column Select Groups to Compare v time minutes No of Samples Check All Rows Clear All Rows OK Cancel Help Figure 6 3 The Select Groups window for 1 way ANOVA testing This screen features a table with a column for each experimental parameter and a row for each condition to compare Uncheck the box in a column s header to ignore groupings based on that parameter When you do this the table is dynamically updated to reflect the change The number of rows decreases and the number of sampl
68. acid sequence in a number of ways Method 1 takes immediate effect 1 Right click while the cursor is in the black genome browser A menu appears 2 Select Options gt Load Sequence A window saying Please wait while nucleic acid sequence is loaded appears After the loading is complete it is possible to zoom in and see the nucleic acid sequence of a par ticular gene The sequence is shown in the magnified genes However this information is not saved so when you exit and re open GeneSpring you must reload the nucleic acid sequence If you would like the sequences to always be readily available you must change the defaults through the Preferences window You may choose to make the load sequence feature automatically load with the program Again note that this applies to version 4 0 and earlier Method 2 takes effect in your next GeneSpring session 1 Select Edit gt Preferences The GeneSpring Preferences window appears 2 Select Data Files from the pull down at the top of the window 3 Select the Load Sequence checkbox 4 Click OK at the bottom of the window 5 Close and restart GeneSpring Or you can select File gt New Window Changing the defaults in the Preferences window does not initiate the load sequence feature in your current session but it does change future initial loading practices The nucleic acid sequence can also be loaded as a side effect of using Tools gt Find Regulatory Sequences For more info
69. ae ee E de eee EAM 6 60 Filter on Gene List Numbers 6 64 Advanced Filter Sie ioco eneeves DrsRICE qe xe a Ee Le BS AUS 6 66 Creating Boolean Filters 00 0 0 cece eee ee 6 67 Saving Biltetssz uem eere ec eae e Balen oe 6 67 Clustering and Characterizing Data The Clustering Window 0 cece cece eh 7 2 Using the Clustering Window 0 0c cece eee eee 7 2 Clustering Methods 0 cece eects 7 5 Gene Tree ce teaches ee te eae a ae ee Y at 7 5 Condition Tree 2o oe pe top RH od add edgy s eet dere ets 7 6 k Means Clustering sse e 7 8 Self Organizing Maps 00 00 c cee eee eee 7 11 OF Clustering perras one Ba weak Ew Oe eae eos ERU 7 14 Principal Components Analysis 2 02 eese 7 16 PEA OMT Genes 3 ore Eee e RIO dee D a es 7 16 PCA on Conditions sg o e e a E E ET E EE ARA 7 18 Interpreting your PCA Results 7 20 The Class Predictor 2 0 sin cece cece ee 7 24 Interpreting the Results of a Prediction 0 0000 7 25 Find Similar Samples 2 0 0 cece eee een 7 26 The Find Similar Samples Results Window Lus 7 26 Scripts and External Programs SOLIDIS Lai cn indiga Ke ch De arnbigh aa ea ebrio oaa andog ett eg aes 8 2 WhatisaScrnpt 22 sre l4 eee ace E e VIC A Rs 8 2 The Run Script Window lsssseseee een 8 3 The Script Inspector 8 5 Using the Remote Server 8 5 Using the ScriptEditor 0 0 eee eee ee 8
70. all genomic elements 7 215 genes H PCA Yeast cell cycle time Choose Target Sample yeasttimeseries txt column 8 CI PIR keywords Choose Sample Pool All local Yeast samples 16 samples t Ha Simplified Gene Ontology IX all genes ij Weighting Coefficient 0 25 all genomic elements rin ACGCCT in all ORFs Computation Preferences 8 like YMR199W CLN1 0 Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 7 17 The Find Similar Samples window There are two ways of reaching this window e Select Tools gt Find Similar Samples From the Sample Inspector s Similar Samples tab click the Find Similar Sam ples button To find similar samples 1 Select a gene list from the navigator and click Choose Gene List 2 If the sample to be compared is not pre selected click Choose Target Sample The Select Target Sample window appears This window works similarly to the Sam ple Manager except that you can select only one sample For details on using this win dow see The Sample Manager on page 3 23 3 To specify the samples among which to search click Choose Sample Pool The Select Sample Pool window appears This window works exactly like the Sample Man ager For details on using this window see The Sample Manager on page 3 23 4 If necessary change the value in the
71. and for the lines between the genes in the Physical Position View the Tree lines the Ordered List lines etc Welcome to GeneSpring 1 19 Setting Preferences Background Color The Background Color defines the color behind the genes and other elements in the genome browser Selected Color tThe Selected Color is used for selected genes gene names and axes For this you will probably want the greatest contrast with the background color Text Color Defines the color of text displayed in the Genome Browser window e Presets The Presets pull down menu allows you to choose from a variety of pre defined color schemes To create a custom color scheme modify colors as desired and check the Save as custom color scheme box This saves your current color scheme in the Presets menu under the name Custom Color Scheme You can save only one custom color scheme at a time Group Colors This section allows you to set colors used for the following Classifications Parameters Gene Lists PCA Gene Inspector Find Similar Samples Color by Attribute Each box in the displayed grid indicates a color for that group Click on a box to select it The Selection area at the bottom of the panel displays that box with the name and color of the selected group Double click the selected box or click Change to view the Change Color window To restore the color defaults click the Defaults button For more information
72. are yes no or guess This attribute is used only for database queries If the Java query option is used this setting is overridden Contents lt DatabaseQuery gt lt JavaQuery gt Attributes cacheable numeric Usage lt GetSampleAttributes cacheable true numeric guess 7 GetSample Attributes gt Notes Optional lt GetFile gt Specifies parameters for retrieving associated files This element has four attributes The type attribute specifies the type of file to be retrieved This attribute is required and is case insensitive There are five possible values A 14 Installing from a Database Connecting your Database to GeneSpring Sample Image a picture or pictures of the biological sample Array Image a picture or pictures of the scanned array s CEL File an Affymetrix CEL file actually stored as a general attachment with MIME type application x A ffyCELFile Raw Data File a raw data file or files attachment a general attachment The location attribute specifies the location of the files to be retrieved This attribute is required and is case insensitive There are four possible values e database returns the contents of the file typically a blob e file returns a file pathname e URL returns a URL e java returns com sigenetics ext database getFile The deleteAfterwards attribute specifies whether to delete the file once it has been importe
73. are in a sequence such as before and after a time series or a drug series The sequence does not have to be continuous but it must have an order If your experi ment is set up with an experimental point taken at each of before after and control the following correlations will not make sense applied to your data Smooth Correlation To compute a Smooth correlation Make a new vector A from a by interpolating the average of each consecutive pair of ele ments of a Insert his new value between the old values Do this for each pair of elements that would be connected by a line in the graph screen Do the same to make a vector B from b Smooth correlation A B A B similarity between gene A and B 07 3 25 3 5 0US5 525 89 3 2 5 3 5 1 5 5 2 5 8 47 2 52 1 5 2 52 43 3 52 52 82 Experiment 1 2 3 4 5 Gene A 10 4 1 2 3 Gene B 2 4 3 7 9 Gene C 2 8 6 7 8 Between Experiments 1 and 2 2 and 3 3 and 4 4and5 Gene A 7 2 5 1 5 2 5 Gene B 3 3 5 5 8 Gene C 5 7 6 6 7 5 Change Correlation The Change correlation looks for the opposite of what the Smooth correlation looks for The change correlation only looks at the change in expression level of adjacent points However it is also very similar to the Standard correlation in that it measures the angular separation of expression vectors for genes A and B around zero
74. are not completely sequenced you can search for cytogenetic band markers For organisms that are completely sequenced you can restrict your search to regions between specified bases Only the fields appropriate for the given genome appear Any gene that falls even partially within the specified region is identified by the search You can restrict your search to genes containing specific sequences These sequences can include the IUPAC IUB ambiguity codes A K Y W etc Note that the symbol X is not allowed and users who want to specify a single wildcard base should use N instead Searching for NNNN therefore identifies all the genes in the genome and may result in an out of memory error 2 Check or uncheck the annotation fields to search Searches are limited to the annota tions listed on the screen 4 6 Viewing Data 3 Specify the genomes in which to search The more genome names you check the longer the search process will take 4 Click Start When the search is complete a list of the genes that match the search criteria appears Search for Data Objects amp Search GeNet tetris Search for Data Object Search for Gene r Data Type Search Fields Iv Gene Lists Name GeNet Identifier IV Experiments Author Owner Uploader IV Gene Trees v Condition Trees Notes Iv Classifications Allachments Sample Attribute IV Pathways Experiment Parameter IV Samples Upload Date MM DD YYYY Genomes to
75. as described in the following lists Affymetrix Pivot Table Column 1 interpreted as Gene Name Average Difference or Signal interpreted as Signal Detection or Abs Call interpreted as Flags Metrixs Gene Name or Probe Set or Probe Set Name interpreted as Gene Name Signal or Average Difference interpreted as Signal Detection or Abs Call interpreted as Flags 3 12 Working With Experiments Using the Column Editor P M A interpreted as Flag Designators e Region interpreted as Experiment Name d Chip Probe Set interpreted as Gene Name e Column to left of column that ends in call interpreted as Signal e Description interpreted as Description Accession interpreted as GenBank ID Column to the right of call that is to the right of a Signal column interpreted as Flags P M A interpreted as Flag Designators Agilent e ProbeName interpreted as Gene Name rBGSubSignal interpreted as Signal e gBGSubSignal interpreted as Control Description or GeneName interpreted as Description GenBank interpreted as GenBank ID Amersham e GeneID interpreted as Gene Name Signal Mean interpreted as Signal Background Mean interpreted as Signal Background Flag interpreted as Flags Q P 272A 3 M interpreted as Flag Designators Axon GenePix Pro 2 amp 3 Note The Ratio Formulation entry is used to determine which channel is Signal and whi
76. column 10 0 945 TIERRA A V 80 minutes 3 veasttimeseries tt column 16 0 92 rn IR 140 minute 4 yeasttimeseries t column 0 811 mimm 50 minutes yeasttimeseries bt column 15 0 896 DILITTECEEEENTTEEEN LECCE 130 minute 6 veasttimeseries btcolumn 17 0 895 ANANA LCCC 150 minute yeasttimeseries bt column 18 0 894 trm V B 160 minute 8 veasttimeseries tt column6 0 881 mim V W 40 minutes Copy to Clipboard Save to File Create Ne Figure 7 18 The Find Similar Samples Results window The top portion ofthis screen displays a bar graph of your query results The lower portion of the screen displays a table of the samples returned by the query By default the samples are listed in the order of their correlation with the sample to which they were compared To re order the table click the appropriate column header Double click a row to view that sample in the Sample Inspector The following options are available e Change Colors Specify the colors used in the bar graph display You can color the graph using a single solid color or by a selected sample attribute If no sample attributes are defined for the samples the Attribute option does not appear Configure Columns Specify what information to display in the table of samples Copy to Clipboard Copy the information in the table of samples to the clipboard This data can be pasted into a text file or a spreadsheet application such as Excel Save to
77. correction is applied negative signal levels may still be present for a few measurements GeneSpring offers the option as the last step of normalization to set these values to zero Also when interpreting data in logarithm or fold interpretations GeneSpring treats all normalized ratio values less than 0 01 including 0 and negative val ues as if they had a ratio of 0 01 to prevent transformation problems 5 20 Normalizing Data References References Clevel W S and S J Devlin 1988 Locally Weighted Regression An Approach to Regression Analysis by Local Fitting Journal of the American Statistical Association 83 596 610 Yang Y H Dudoit S Luu P and T P Speed 2001 Normalization for cDNA Microar ray Data SPIE BiOS 2001 San Hose California January 2001 Normalizing Data 5 21 References 5 22 Normalizing Data 6 Analyzing Data Analyzing Data 6 1 Creating and Editing Gene Lists Creating and Editing Gene Lists You can create a gene list by selecting Edit Gene List from the Edit menu The Gene List Editor screen appears From this screen you can create a new gene list based on a variety of selection criteria or add remove genes from existing gene lists This screen is very similar to the Sample Manager Gene List Editor DER Filter on Gene List Type a List Filter on Annotation Show all Filter Results 1 17 a a ae ESEETS ei Common Name Synonym ini vev2sic m
78. data objects to GeNet Method 1 Position your cursor over a data object in the navigator you want to upload and right click Select Upload to GeNet from the pop up menu Method 2 Select the desired object in the navigator and drag it into the appropriate GeNet folder displayed in italics H Gene Lists tetris HC Simplified Gene Ontology H Users HHO meredith HH rickt HD wendy HC EC H PCA Yeast cell cycle time series no 90 min PIR keywords Hy Simplified Gene Ontology GeNet folders in italics z E u z Selected data object all genes 3 all genomic elements IHC Experiments H Gene Trees Figure 9 3 GeNet folders in the GeneSpring navigator 2 When the GeNet Upload window appears enter a destination directory or accept the default To create a new destination directory enter a name To browse for a directory click Change and select the desired directory 3 Click Start to begin uploading to GeNet Exporting GeneSpring Data 9 11 Publishing Data to GeNet Upload to GeNet BEE This will upload the Normalized Data between 0 975 and 1 002 gene list to GeNet A blank entry in Destination Directory will upload to the Gene List folder on GeNet Note You must have write privileges in the appropriate GeNet genome in order to upload Destination Directory Gesmeedt i
79. do Analyses 0 005 B 8 C Technical Details for the Predictor GenesSeleGtIOmz oom SENE cet he bv de d C 1 Classifying the Test Samples 0 00 0 ce cece eee eee C 1 Decision Threshold 0 e ak fe AE ENERE ES Ae RA C 1 References for the Predictor 0 0 0 0 0c ccc ce eee C 2 vii viii 1 Welcome to GeneSpring Congratulations on selecting the most advanced flexible tool available for gene expres sion data analysis This manual is a guide to GeneSpring features Chapter 1 covers installing GeneSpring loading and setting up your data and GeneSpring basics The remaining chapters discuss loading set up and the various data analysis and visualization tools in detail Welcome to GeneSpring 1 1 Getting Started Getting Started Requirements Windows Windows 98 NT 2000 Pentium II or better e 256MB RAM 512MB recommended e 1024x768 display 40MB of free disk space Macintosh MacOS 9 1 or higher Power PC or better e MRJ 2 2 5 e 256MB RAM 512MB recommended e 1024x768 display 40MB of free disk space Most common Unix OSes Linux and Solaris recommended A JVM that supports JDK1 1 or later e 256MB RAM 512MB recommended e 1024x768 display 40MB of free disk space Installing from a CD If you are installing GeneSpring from a CD you have several options after you place your CD in the drive 1 Select Install GeneSpring Demo A splash screen and an Install
80. field Filter on Data File Filter on Data File allows you to filter genes based on values in a specific column of your experiment data files For example if you specified a flag column when you loaded your data you can filter on Present or Marginal calls If your sample data files are in multiple formats this screen appears with a separate tab for each data format The available options on each tab are the same as the options for the standard Data File Restrictions window Analyzing Data 6 61 Basic Filters 8 Filter on Data File DAR E Gene Lists J all genomic elements CAMDA Yeast Full Set renormalized LJ PCA Yeast Extraterresti HHC PIR keywords Search BHE QA Testing Function Gene Identifier Normalized CAM Intensity CAMDA Reference CAML Flag CAMDA Yea Nc HHO Simplified Gene Ontoic lineddata vaLootc 0 65441 0 65441 285 75217 14 Lg 0 86765 0 86765 1596 26587 14 PCA D norm Sk1 mei p Please click on cells in the Search row to select which column s to filter on 0 85194 0 85194 3824 57285 0 1 50148 1 50148 8854 61035 14 L 15 genes 1 33787 1 33787 2559 01562 1 HE 15 genes and p value f 0 8268 0 8268 4235 60938 On HE 2 Fold change in origin 0 87763 0 87763 3118 62305 Of g ACGCGT in all ORFs 0 79234 0 79234 786 28009 0 1 HE Control Signal above 2 line 12 data v L010C 0 97006 0 97006 1024 68225 m P equal to with 55 Search Criteria g Genes with fold change HE genes within 596 of tim
81. first argument Filter Condition Group One Boolean group and one condition group If true pass the second argument for each Boolean through the corresponding first argument Filter Experiment Group One Boolean group and one experiment interpretation If true output an experiment interpretation for each Boolean in the first argument Filter Condition Tree Group One Boolean group and one condition tree group If true output an condition tree for each Boolean in the first argument Filter Gene Clas sification Group One Boolean group andone classification group If true output a classification for each Boolean in the first argument Filter Gene Group One Boolean group and one gene group If true output a gene for each Boolean in the first argument Scripts and External Programs 8 25 Script Building Blocks Name Input Description Filter Gene List One Boolean group If true output a gene list for each Boolean in the Group and one gene list first argument group Filter Gene Tree One Boolean group If true output a gene tree for each Boolean in the Group and one gene tree first argument group Filter Number One Boolean group If true output a number for each Boolean in the first Group and one number argument group Filter Sequence One Boolean group If true output a sequence for each Boolean in the Group and one sequence first
82. folder navigate to that folder in the directory browser in the lower left portion of the screen and leave the Folder field blank To save in a new subfolder navigate to the desired parent folder and enter a name for the new folder in the Folder field e fdesired enter notes containing more descriptive information about the experi ment When you are done click Save Your new experiment has been created Working With Experiments 3 16 Copying and Pasting Experiments Copying and Pasting Experiments You can use the copy Ctrl C and paste Ctrl V functions to insert a new experiment or lists from the clipboard into GeneSpring Preparing to Paste You should have normalized data in an Excel file or saved as tab delineated text Figure 3 12 You must have all of the following three parts to your data Your data must be in the following format to correctly paste into GeneSpring Name Parameters Data The seven parameters i Parameter values for this experiment for third sample Ed Microsoft Excel Diseased Data xls File Edit View Insert Format Tools Data Window Help B es E X Multiple Disease Example Li Ke Sick no y y 3 Disease no hepatitis hepatitis byphilis oste Nnfectious Disease n y y n Y Hepatitis n y n Type Hepatitis Q n a n First gene Cancer n n n in list 8 YType Cancer n n n 9 Time minutes 0 10 1 30 10 Y ALDO1C 0 941667 0 575 0 95 0 925
83. for a cluster to be considered valid Minimum Correlation The minimum correlation for any pair of genes in the same cluster Similarity Measure available options are Standard Correlation Smooth Correlation Change Correlation Upregulated Correlation Pearson Correlation Spearman Correlation Spearman Confidence Two sided Spearman Confidence Distance For more information on measures of similarity see Similarity Definitions on page 7 4 7 14 Clustering and Characterizing Data Clustering Methods Saving QT Clustering Results When the operation is complete the Choose Classification Name window appears Choose Classification Name QT clusters for Yeast cell cycle time series no 90 min Default Interpretation QT clustering of ACGCGT in all ORFs based on interpretation s interpretation Yeast cell cycle time series no 90 min Default Interpretation mode Log of ratio weight 1 0 The parameters used Minimum Cluster Size 10 Minimum Similarity 0 1 Similarity Measure Standard Correlation Save Classification As Gene Classification YALO34W A Set 1 YAROO C Set1 Gene Lists YAROOSW Set1 YBLOO2W Set 3 YBLOOSC Set 3 YBLOT1wWW Set 3 YBL021C Set 3 YBL0289W Set amp YBL035C Set 1 YBLO085W Set6 YBLO86C Set1 YBL101W A Set13 YBL101W B Set13 YBROOSW Set 6 YBROO C Set4 Save Cancel Figure 7 9 The Choose Classification Name window Classification H Classifications
84. have scripts run on your local machine by default Select Remote to run scripts on a remote execution server by default Local Computation Settings Check Don t Show Script Result Summary Window to skip the Script Result Summary when a script completes its execution The Current Scale Factor for Time Estimate option allows you to modify the multiplier for GeneSpring s internal estimate of how long an analysis will take This can be useful if you have significantly changed hardware or settings on your com puter added more RAM etc Remote Computation Settings Check Automatically check for results to have GeneSpring automatically check whether your script has finished running Specify how often to check by entering a number of minutes in the Delay between checks box Miscellaneous The Miscellaneous panel contains a variety of settings to customize your GeneSpring installation Default Minimum Correlation Specifies the default minimum correlation coeffi cient that appears near the Find Similar button in the Gene Inspector window Restrict Gene List Searches Allows you to limit the lists GeneSpring examines when searching for similar lists in the Gene Inspector window and during Tree build ing Welcome to GeneSpring 1 23 Setting Preferences Search Gene Lists Stored Specify whether to search gene lists stored on your local machine on a GeNet server or both Use the Cross Gene Error Model by Default in Experiment Interp
85. in the list and click View File to view the contents of the file in an external program The appropriate program is automatically selected if the file type is known The Classification Inspector The Classification Inspector allows you to learn about the method used to construct a clas sification or to learn more about the variability explained by each class within a classifica tion To use the Classification Inspector right click a classification in the navigator panel and select the Inspect option 4 22 Viewing Data Inspectors Classification Inspector DER 10 cluster K Means for P CALCINEURIN CRZP1 PATHWAY STUDY Sunshine Fuller Silicon Genetics Internal udwvr 4423 Thu Apr 17 12 20 57 PDT 2003 GeneSpring 6 Directory Location lt Top Level gt Notes i Classification Details Selected Gene List ACGCGT in all ORFs pes Totan of Genes Number in Gene List afunclassified 784 0 Jalclasses 784 0 OK Attachments Cancel Help Figure 4 14 Classification Inspector for a k means clustering with 10 groups In Figure 4 14 the notes field contains information about the method used to make the classification If the classification is the result of clustering this field displays information such as the type of clustering the distance metric and the number of iterations that were used to perform the clustering You can save your own comments about the classification here for future reference
86. larger than that which GeneSpring defines as a genome To search for data on GeNet from within GeneSpring select Edit Search GeNet The Search GeNet screen appears From this screen you can search for either genes or associated data objects However searches will only return data you have permission to view If you are logged into more than one GeNet server a dialog appears that prompts you to select the GeNet server on which to search You cannot search multiple GeNet servers at the same time Viewing Data 4 5 Finding and Selecting Genes Search for Genes amp Search GeNet tet Search for Data Object BREITES Search For r Search In IV Gene Name rSearch Options Case Sensitive IV Common Name I Whole Words Only V Synonym v Use as a Wildcard IV GenBank Accession No Genomes to Search Check All Clear All Athaliana Cities I H pylori I Caulobacter E coli Hu95aVer2 C elegans I D melanogaster HU35K ABCD Start Close Help Figure 4 3 The Search GeNet for Genes window To search for genes 1 Enter a search term in the Search For field Enter multiple terms separated by the word or in this field to match any gene con taining either of these terms even if they do not appear in the same field Enter an asterisk in this field to match any gene with an entry in any of the fields you specify in step 2 For organisms that
87. later in the process The neighborhood radius is expressed in terms of Euclidean distance in grid units relative to the abstract grid of the expression pat terns This is different from the distance between nodes in gene expression space For instance point 1 2 is one unit away from 1 3 If you make the neighborhood radius very small less than 1 each point always moves independently and adjacent clusters are not related If you specify a very large neighbor hood radius initially all the nodes move toward every data point and the grid behaves as if it is very stiff with more similarity between node results but less flexibility to explore the variations in the data Self Organizing Map Options The following options are available for SOM clustering Clustering and Characterizing Data 7 11 Clustering Methods Rows The number of rows in your grid The default setting is based on the number of genes and conditions in the selected experiment s Columns The number of columns in your grid The default setting is based on the number of genes and conditions in the selected experiment s Number of Iterations How many times each gene is examined For example if there are 10 000 genes and 60 000 iterations are specified each gene is examined six times Neighborhood Radius How many nodes move toward a data point at the beginning of the iteration and therefore how similar the profiles are for each node Discard genes wi
88. less than the negative of the specified percentile the tenth percentile is used as a background correction and subtracted from all genes before the specified percentile is taken Normalize to a Constant Value If you are using a technology that calculates its own number for normalization you will want to use constant values For instance Affymetrix s Global Scaling centers your data around 2500 in this case you would need to normalize your data to 2500 to center it around 1 signal strength of gene A in sample X hard number in sample X To normalize to a constant value simply enter the desired value in the Per Chip Normalize to a constant value text box By default this value is set to 1 0 Region Normalization Regions are assigned in the column editor during the data loading process For more infor mation see Using the Column Editor on page 3 9 If you have defined regions in your data all normalization steps are applied on a per region basis There are three ways of designating regions Each data file for a sample is assumed to be a separate region Each distinct value in the Region Column is designated as a region A specified list of region codes which may or may not be suffixes in the region col umn This option is included for backwards compatibility 5 12 Normalizing Data Normalization Types The Affine Background Correction If negative values form a large fraction of your data set GeneSprin
89. lines are valuable because you can select points that lie above or below them by right clicking in the appropriate position in the genome browser In addition to fold lines you can add lines to the origin of each axis as well as draw a line of best fit To modify the use of lines 1 Select View gt Display Options or right click anywhere in the genome browser and select Display Options 2 Click the Lines to graph tab To use fold change lines click the Fold Change Lines box If you only want one pair of fold change lines select the Set Lines At radio button and enter a number in the fold box If you would like more than one pair of lines check the 4 46 Viewing Data Scatter Plot View Set Lines at Multiple Intervals box and list the fold values to view separated by commas To show a trend in your data check the Line of Best Fit box Note that the regression is performed on the transformed data and this line is always linear regardless of how the axes are chosen To make the origin of each axis more visible check the Lines Through Ori gin option To see a grid inside the plot area you can have lines drawn at the major and minor tick intervals of each axis Check the Horizontal Vertical Grid Lines checkboxes that to see The color of these grid lines is represented in the Grid Color box at the bottom of the window To modify the grid color click Change Changing Labels and Features The scatter plot view als
90. list To delete a gene from the pathway right click over the gene and select Delete Pathway Element Zooming coloration movement and the Find Genes Which Could Fit Here features work in this view Find Genes Which Could Fit Here suggests genes that might be appropriate in certain areas of the picture See Pathways on page 6 16 for more details Pathway Display Options The following display options are available in graph view Features tThe available options for this view are listed below 4 62 Viewing Data Pathway View e Coloring See Color on page 4 30 Legend See Legend on page 4 28 To modify the appearance of your pathway select View gt Display Options The Display Options window includes a Features panel with the following options Color by all conditions Divides the genes into sections representing multiple condi tions so that all conditions in the selected interpretation can be viewed simultaneously Using this feature disables the condition slider at the bottom of the genome browser Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Viewing Data 4 63 Compare Genes to Genes Compare Genes to Genes The Compare Genes to Genes view allows you to observe the similarity between the expression profiles of two genes in one list or in
91. livertbraintbreast kidney liver YALOO1C 0 941666722 40 575000048 0 95000004840 92500007241 166666746 gt 0 80000007240 753233338540 95833337341 04166674641 25000011941 45833337 1 98333346841 05166669840 95000004841 21666669841 20000004842 363934453 gt 3 24475717542 93003988342 27639436741 74797308441 42541265541 09043073 Figure 3 12 Example of a correctly formatted tab delineated file Common Mistakes in Pasting forgetting the title notusing parentheses e not having parameters using non normalized data data can be normalized within GeneSpring having extraneous columns forgetting to indicate parameters having non numeric parametric values with an aster isk using more than one type of decimal marker or the wrong type for your computer s settings Working With Experiments 3 19 Copying and Pasting Experiments Pasting an Experiment into GeneSpring 1 If you have not done so already give your experiment a unique name If the name is already in use GeneSpring appends a number to distinguish it from other experiments of the same name Select all or part of a properly formatted Excel or tab delineated file and click Copy or press Ctr1 C Note Some computers have a limit on the amount of data you can place on the clipboard If you are consistently crashing at the point you may need a JVM update 2 In the main GeneSpring window selectEdit gt Paste gt Paste Experiment GeneSpring automatic
92. may or may not be equivalent to the raw measure ments To limit by a cutoff 1 Check the Use only measurements with box 2 Select whether to limit by raw signal or current normalized values from the pull down menu 3 Enter the cutoff figure in the text box The default value is 10 0 You can choose to apply additional background correction in this step To apply back ground correction check the appropriate box in the Background Correction section of the screen You have the following options Never apply extra background correction Always apply extra background correction Prior to taking the specified percentile the bottom tenth percentile is used as a background correction and subtracted from all genes Ifneeded apply extra background correction For samples in which the bottom tenth percentile is less than the negative of the specified percentile the tenth percentile is used as a background correction and subtracted from all genes before the specified percentile is taken 5 16 Normalizing Data Normalization Strategies for Specific Technologies Normalization Strategies for Specific Technologies Normalization of Affymetrix Data Often data in affymetrix chp files are either pre scaled or are pre normalized While Affymetrix s scaling and normalizations are designed to meet the same needs as Gene Spring s they are not equivalent The Affymetrix global scaling procedure which is comparable to GeneSprin
93. n a lt ExperimentWorkedDesignation gt plain text n a lt ExperimentAbsentDesignation gt plain text n a lt ExperimentMarginalDesignation gt plain text n a lt RegionColumn gt plain text n a lt TreatNoSignalAsInvalid gt plain text n a lt LowerBoundOnSignalColumn gt plain text n a lt UpperBoundOnSignalColumn gt plain text n a lt StandardDeviationSignalColumn gt plain text n a lt ColumnHeaderLine gt plain text n a Tag Definitions lt ExternalDatabaseConfiguration gt The top level element defining the entire database configuration This element contains all of the other tags Installing from a Database A 9 Connecting your Database to GeneSpring Contents lt GeneralConfiguration gt lt Database gt Attributes n a Usage lt ExternalDatabaseConfiguration gt lt ExternalDatabaseConfiguration gt Notes Required can appear only once in the configuration file lt GeneralConfiguration gt The element containing all of the general configuration options for the database Contents lt LoadClass gt lt ProcessedDataListFile gt Attributes n a Usage lt GeneralConfiguration gt lt GeneralConfiguration gt Notes Required can appear only once in the configuration file Database The element containing the specifics of the source or sources of sample data whether it is in a database or a directory of flat files You must have one Database section for each source to which you wil
94. need to keep the genes in the view along with the immediate tree branches Zooming in by drag ging a rectangle with the cursor usually produces a magnified view that contains more ele ments than were in the selected area The amount of magnification is visible in the parameter specification area just below the genome browser Use arrow keys to pan the screen while zoomed Panning never takes you outside the bounds of the selected subtree if any When a subtree is selected clicking Zoom Fully Out displays the entire subtree not the entire tree To return to the top level right click anywhere and select View Entire Tree You cannot zoom in on the thumbnail in the Eisen like view Viewing Nodes After clustering the genes according to their expression patterns all known lists are checked against all subtrees of the new gene tree to assign names to the tree nodes where possible These labels are taken from the gene lists in the standard lists Place your cursor as close as possible to a label or intersection to view the text When the cursor pauses over an intersection a label appears It disappears when the cursor is moved All of the branches intersecting to form a node constitute the subtree defined by that node A label such as ribosome 15 1 means the subtree from that node has a lot in common with the genes in the ribosome list The numbers in square brackets are a measure of sta tistical significance The higher the v
95. not take effect in the currently open window or in your current Gene Spring session Saved changes in the preferences window will not take effect until Gene Spring is restarted Select Edit gt Preferences To change any options in the Preferences window click the appropriate tab to view the available settings Data Files On this tab you can set the defaults of what you would like to see when GeneSpring opens Set the defaults on this tab to have GeneSpring open directly to your chosen genome Data Directory tThe directory containing all GeneSpring data including the genome that opens at startup Use the browse button or the Navigator to choose the directory Load Sequence Load nucleic acid sequences with the genome data Suppress warnings about ambiguous gene identifiers when opening experi ments Check this box to suppress the ambiguous gene identifier warning message not recommended For more information on ambiguous gene identifiers see Ambig uous Gene Identifiers on page 2 9 Default Genome The default genome to open when you start GeneSpring Select No default genome to be prompted for the genome to open each time you start GeneSpring Select Open the genome that was last used in the previous session to default to the last genome opened Select Open a specific genome to specify a default genome to open every time you start GeneSpring To change this value select the desired genome from the di
96. number in the Weight column of the Experiments to Cluster window and enter a new value When you are done adding or removing experiments click OK to return to the main clus tering window Some Notes on Experiment Weight Correlations of multiple experiments are performed through a weighted correlation in which you specify the weight of each experiment You can make one experiment or exper iment set more important than another If all of the experiments or experiment sets are given the same weight they are averaged equally The name of the experiment is noted directly after its relative weight For example you could give SampleExperimentl a weight of 2 and Experiment2 a weight of 1 Therefore in this example the correlations found in the SampleExperimentl are twice as influential in creating the tree as the correlations between the genes in the Experiment2 study The equation used to determine the overall correlation is Clustering and Characterizing Data 7 3 The Clustering Window X Aa Bb Cc atb tcr e Ais the correlation coefficient between the gene in question in experiment 1 and the gene named in the Experiments to Use box also from experiment 1 aisthe weight specified for experiment 1 Bis the correlation coefficient of the gene in question in experiment 2 to the gene named in the title bar also from experiment 2 bisthe weight associated with experiment 2 e Cis the correlation coefficient of
97. of the basic filters described in the previous section are available as well as Filter on Gene List and Filter on Annotations In addition you can perform Statistical Analysis ANOVA and Find Similar Genes operations For more information on these operations see Statistical Analysis ANOVA on page 6 33 and The Find Similar Command on page 6 6 Advanced Filtering Add Restriction gt gt Choose a Restriction Filter on Gene List Filter on Annotations Filter on Gene List Numbers Filter on Expression Level Filter on Fold Change Filter on Confidence Filter on Data File Filter on an Arbitrary File Duplicate Control Signal between 10 Yeast cel eycle time se Filter on Fold Change Yeast cell cycle time se Delete Move Down Use a Saved Filter Save Filter Save as Script Filter Summary NOT Raw Data between 833 and 100 AND Control Signal be AND Filter on Fold Change Statistical Analysis ANOVA Find Similar Genes Computation Preferences Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 6 22 The Advanced Filtering Window To set up an advanced filter 1 Select a filter from the list of available filters and click Add Restriction You can also add a filter by double clicking its name in the list The specified Filtering window appears Note Each entry in the Advanced Filteri
98. of the the Major Intervals Minor Inter vals boxes and click Apply to view your data with grid lines Features The Features panel of the display options window contains a column of check boxes that allow you toggle on or off certain items in the genome browser Show Horizontal Axis Label Displays the parameter that is graphed on the horizon tal axis Show Vertical Axis Label Displays the parameter that is graphed on the vertical axis Label Vertical Axis on Side Displays the vertical axis label vertically If this is unchecked the vertical axis label sits to the right of the top of the vertical axis Show Condition Line Displays the vertical bar that can be moved with the condition slider 3D Look Places the bars on a diagonal line so as to imply that genes in each condi tion are stacked in rows perpendicular to the horizontal axis Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser 4 40 Viewing Data Physical Position View Physical Position View The Physical Position display allows you to see an experiment or a set of experiments by organizing the genes according to their physical position when the gene loci are known and loaded into GeneSpring within the DNA sequence of the organism Select View gt Physical Position The Physical Position view works for any organism
99. on page 6 16 for more informa tion Note that on a Macintosh the menu bar is at the top of the screen not on the individual GeneSpring windows as displayed in this manual The Navigator GeneSpring organizes data elements relating to your genome into folders in the navigator Each folder contains a specific type of information By default folders in the navigator are closed although on start up GeneSpring displays an all genes or all genomic elements gene list To change the default genome that GeneSpring initially open select Edit Preferences andclick the Data Files tab Enter a genome name in the Default Genome text field and click OK Full GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists HA EC H PCA Yeast cell cycle time series no 90 min FC PIR keywords Simplified Gene Ontology X all genes 8 all genomic elements I5 ACGCGT in all ORFs 1 like YMR199W CLN1 0 95 H Experiments ei Yeast cell cycle time series no 90 min H Gene Trees HO Special Gene Trees L Yeast cell cycle time series no 90 min Default Interpretation H Condition Trees Lh Yeast cell cycle time series no 90 min Default Interpretation H Classifications 9 6x5 SOM for Yeast cell cycle time series no 80 min Default Interpretation Y axis Ys P Chromosome Number Colored by ti Hy Pathways Gene List a 1 H Cell growth an
100. on the LOWESS algorithm The degree of the polyno mial fitted is 1 For efficiency the regression is not calculated at each data point but at a progressively fitted mesh that adjusts to the sparsity of the data If you attempt to re normalize an experiment that has been constructed using the Merge Split Experiment tools you will be unable to apply intensity dependent normalization The control values in merged experiments have already been adjusted and thus do not reflect the intensity of the reference dye If you perform an intensity dependent normalization it is usually not necessary to perform a per chip normalization Like normalizing to the distribution of all genes intensity depen dent normalization should not be used on specialized arrays that contain a small number of genes or on arrays where a majority of the genes may react similarly to experimental conditions Normalizing Data 5 9 Normalization Types Per Chip Normalizations Per Chip normalizations control for chip wide variations in intensity Such variations may be due to inconsistent washing inconsistent sample preparation or other microarray pro duction or microfluidics imperfections GeneSpring does not allow you to perform more than one per chip normalization as they all address the same issue There is no dedicated option for region normalization However if you have region desig nators all per chip normalizations are performed on each region independently
101. on the various color options see Coloring on page 4 47 Specific Color Definition You have the option to define your own colors to use in the genome browser If your printer requires exact color definitions specify them on this screen To change or adjust a color select the Change button next to its element in the Prefer ences Color window Welcome to GeneSpring 1 20 Setting Preferences amp Downregulated Color Downregulated Color Color Preview Presets z OK Cancel Figure 1 6 Color creation in the Preferences window Click over any slider and move it horizontally to adjust the color Watch the color preview box and stop moving the cursor when the desired color is reached Click OK to accept the new color The checkbox Specify no color is only available for the Normal Color settings Gene Labels On this tab you can specify how you would like to name your genes in the genome browser The defaults are systematic name and common name To change the defaults select a new option from the drop down menus To restore the original defaults click Default Gene Labels Browser On this tab specify default web browser settings if you want to use a particular browser for the GeneSpring application You only need to set the Arguments option if you are using an obscure web browser that requires an argument Firewall If your company has a firewall you may need to specify settings to allow GeneSpring to acc
102. option Working with Saved Scenarios To load a saved normalization scenario click Use a Saved Scenario The Select a Normalization Scenario screen appears From this screen you can do the follow ing Load Scenario Select a scenario from the list and click Load Scenario to load it for use in the current experiment Delete Scenario Select a scenario from the list and click Delete Scenario The scenario is removed from the list 5 4 Normalizing Data Experiment Normalizations Rename Scenario Select a scenario from the list and click Rename Scenario A dialog appears prompting you to enter a new name for the saved scenario e Close Return to the previous screen without making any changes Normalization Warnings Warnings occur under the following circumstances Anormalization step is missing e A normalization step is inappropriate i e there are too few genes or samples Normalizations are applied to only some of the samples Warnings appear in orange You can proceed with an active warning but the results may not be what was intended Fatal errors appear in red A fatal error means that the current normalization steps will not produce a usable result In this case the OK button is disabled until the problem is solved When a warning or error applies to a specific normalization step that step is displayed in the list in the appropriate color for the warning or error Normalizing Data 5 5 Normaliza
103. possible cutoff points on gene expres sion level for that gene are considered to predict class membership either above or below that cutoff Genes are scored on the basis of the best prediction point for that class The score function is the negative natural logarithm of the p value for a hypergeometric test Fisher s exact test of predicted versus actual class membership for this class versus all others A combined list containing the most discriminating genes for each class is produced as the predictor list Each class is examined in turn and the gene with the highest score for that class is added to the list if it is not already on the list Then genes with the next highest scores for each class are added This is continued in rotation among the classes until the specified number of predictor genes is obtained If you save the list of predictor genes as a Gene List the best prediction score of the gene among the classes for which it would have been added to the list is saved as the attached number on the list Classifying the Test Samples Based on the selected genes classifications are then predicted for the independent test data using the k nearest neighbors rule A sample in the independent set is classified by finding the user specified k nearest neighbors of the sample among the training set sam ples based on Euclidean distance between the normalized expression ratio profiles of the samples The class memberships of the neighbors a
104. sample to sample fold comparison statistical group comparisons and associated numbers restrictions All restrictions applied to create a new list are saved in the notes The ability to restrict a gene list based on the behavior of its genes in experiments or in individual samples is an important quality control tool You may want to remove genes with low precision large error values those that do not vary significantly across multiple samples or those with expression levels that are too close to the background Filtering genes also allows you to search for genes that are differentially expressed over two or more conditions There are eight basic filters and an Advanced Filtering option Filter on Expression Level Filter on Fold Change Filter on Error Filter on Confidence Filter on Flags Filter on Data File Filter on Arbitrary File Filter on Gene List Numbers To view a filtering window select it from the Filtering menu The Basic Anatomy of a Filtering Window Most of the Filtering windows are organized in the same way Analyzing Data 6 51 Set filter range using sliders or by entering numbers Filter on Error EH Gene Lists HC EC FC PCA Yeast cell cycle time HOJ PIR keywords Simplified Gene Ontology 3 all genes X all genomic elements The Filtering Menu Choose Gene List gt gt like YMR199W CLN1 0 95 C
105. sample variation This is the variation between samples in the same con dition This represents biological or sampling variability such as variability between multiple subjects in a condition between multiple physical samples for an experimen tal subject or patient or between multiple hybridizations of a physical sample Gene Spring can represent any one of these kinds of variability depending on the types of replicate samples you have specified in your interpretation and in the error model dia log GeneSpring assumes all replicate samples in the same condition correspond to one kind of variability When you turn the Cross gene Error Model on the Error Model is used as the basis for standard deviation representing the variability of individual population members Standard error representing the precision of the mean of the gene expression measure ments in the condition with respect to the true condition mean error bars corresponding to standard deviation or standard error in the Graph view and Gene Inspector t test p value representing the statistical test of differential expression for a specific condition Working With Experiments 3 44 Cross gene Error Models color by significance coloring according to the t value from the t test of differential expression finding differentially expressed genes using the Statistical Group Comparison if the error model option is chosen To Turn On the Cross Gene Error Mode
106. secondary 4 71 annotations updating 6 27 API A 2 arbitrary file restrictions 6 63 array layout view 4 60 attributes 3 35 B bar graph view 4 39 blocks view 4 36 bookmarks 4 26 browser display picture 4 37 Build Simplified Ontology 6 31 C change correlation 6 10 B 6 Change Experiment Interpretation 3 39 changing experiment parameters 3 32 3 35 changing experimental data range 4 32 Index class predictor 7 24 classification inspector 4 22 classifications color by 4 34 CLI A 3 clustering k means 7 8 similarity definitions B 2 Color by Parameter 3 32 color 1 20 background 1 20 changing defaults 1 19 selected 1 20 structure 1 19 trust 4 30 color by classifications 4 34 color by expression 4 30 color by parameter 4 33 color by secondary experiment 4 35 color by significance 4 32 color by venn diagram 4 32 color code 3 31 color options 4 30 coloring scheme 4 30 column assignments default 3 12 column editor 3 9 advanced options 3 11 assigning columns 3 9 column assignments 3 10 compare genes to genes view 4 64 condition inspector 4 18 condition scatter plot 4 68 Conjectured Regulatory Sequence 6 23 continuous element 3 31 copying and pasting experiments 3 17 copying gene lists 9 5 correlation commands B 3 correlation equations change correlation B 6 Index 1 distance 6 11 hange correlation 6 10 Pearson correlation 6 11 smooth correlation 6 10 B 6 Spearman confidence 6 11 Spearman correlation 6 11
107. selected gene while holding Shift deselects that particular gene Or Shift and drag your mouse across genes you want to select A box appears as you drag When you release the mouse the selected genes are highlighted When several genes are selected the number of genes selected appears in the genome browser If some selected genes do not appear in the current displayed gene list the legend displays the message x genes selected y genes not in list where x is the number of selected genes and y is the number of genes not in the list Click anywhere in the browser to unselect a the selected genes List Inspector Right click over a list icon in the navigator and select Inspect A Gene List Inspector window appears displaying the common and systematic names of all the genes in the gene list currently being displayed in the genome browser You can select one of the listed genes by double clicking for closer inspection For more information on this window see The Gene List Inspector on page 4 20 Viewing Data 4 9 Inspectors Inspectors The Inspect windows allow you to view the current defaults and available details of any gene condition classification or experiment The Gene Inspector The Gene Inspector window allows you to look at all the data associated with a particular gene see the lists that include your gene make correlations and link directly to Internet databases In the upper left corner of the Gen
108. spider has retrieved for a gene s map loca tion field does not meet the required criteria This window contains a table in which you can correct the entry and GeneSpring makes a guess as to what the entry should be when it displays the table of all the problem entries Analyzing Data 6 28 Annotation Tools Which Annotations are Retrieved The following table describes the annotation fields retrieved by GeneSpider from the vari ous databases Annotation oe GenBank LocusLink UniGene Systematic name Common Map X X X X GenBank Synonym EC number X X X Description X X X Product X X X Phenotype X X X Function X X Keywords X X PubMedID Type DBld X X GO biological process T X GO molecular function T X GO cellular component T X RefSeq X UniGene X Sequence X X Systematic name GenBank accession number Synonym and PubMedID are not filled in by GeneSpring These fields can be filled in manually by the user I have entered a fea ture request to enable the GeneSpider to fill in PubMedID from GenBank records Type is filled in by GeneSpring when it reads a genome from a GenBank gbk file The value is commonly CDS but mRNA rRNA terminator gene and other GenBank feature keynames are possible entries UniGene can also be filled in by the Build Homology Tables feature When the us
109. standard 6 10 standard correlation B 3 two sided Spearman confidence 6 11 upregulated correlation 6 11 correlations 6 10 weighted 7 3 creating new parameters 3 33 3 36 cytogenetic band markers 4 5 4 6 D data format 2 10 data loading 1 8 data types restrictions 6 53 database JDBC driver 1 18 databases A 2 installing GeneSpring from A 2 DBMS A 2 decimal markers 8 4 default colors changing 4 35 default column assignments 3 12 default normalizations 3 21 one color experiments 3 21 pre normalized data 3 21 two color experiments 3 21 default normalizations applying 5 4 deleting objects from GeNet 9 12 deleting parameters 3 33 3 36 display elements hide all 4 71 show all 4 71 show hide 4 71 display options 4 25 3D scatter plot 4 49 array layout 4 60 bar graph 4 39 blocks view 4 36 bookmarks 4 26 color 4 30 color by classification 4 34 Index 2 color by expression 4 30 color by parameter 4 33 color by secondary experiment 4 35 color by significance 4 32 color by venn diagram 4 32 condition scatter plot 4 69 error bars 4 28 graph by genes view 4 65 graph view 4 37 legend 4 28 linked windows 4 25 no color 4 34 ordered list 4 58 pathway view 4 62 physical position 4 43 scatter plot 4 45 split windows 4 25 tree view 4 56 vertical axis 4 27 distance 6 11 divide signal by control channel 5 7 downregulated color changing 1 19 drag and drop gene lists 9 5 E electronic signatures 8 4 9 14 EMBL files 2 6 add
110. that allow you toggle on or off certain items in the genome browser Show Experiment Name Displays the name of the current experiment in the upper right hand corner of the genome browser Show Horizontal Axis Label Displays the parameter that is graphed on the horizon tal axis Show Vertical Axis Label Displays the parameter that is graphed on the vertical axis Label Vertical Axis on Side Displays the vertical axis label vertically If this is unchecked the vertical axis label sits to the right of the top of the vertical axis Show Condition Line Displays the vertical bar that can be moved with the condition slider Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser 4 38 Viewing Data Bar Graph View Bar Graph View The Bar Graph view allows you to visualize one experiment or a set of experiments by plotting the relative expression of each gene against experimental parameters such as time or drug concentration Each gene is represented as a vertical bar To switch to Bar Graph view select View gt Bar Graph Note Genes with no data cannot be displayed in this view Full GeneSpring Yeast Genes all genomic elements BAR File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists Experiments Gene Trees Condition Trees Clas
111. that section only Analyzing Data 6 13 Working with Gene Lists Full GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help HY Gene Lists Chromosome HO EC HC PCA Yeast cell cycle time H PIR keywords C Simplified Gene Ontology X all genes Make list ofthese genes 502 genes X all genomic elements Make list of genes in both lists 556 genes H8 ACGCGT in all ORFs Make list of genes in either list 7 216 genes L like YMR199W CLN1 0 Set Coloring FC Experiments HC Gene Trees HC Condition Trees HO Classifications HC Pathways HC Array Layouts HC Expression Profiles HOJ External Programs Mito HC Bookmarks HO Scripts 0 Colored by Venn Diagram Gene List all genomic el Base pair Magnification 1 Figure 6 9 A Venn diagram with pop up menu 3 Specify the gene list from which to obtain associated numbers and click OK 4 Name and save your new list In views where lists can be ordered such as the Ordered List and Compare Genes to Genes views lists made from the Venn diagram are ordered according to the values asso ciated with the lists you used to create the Venn Diagram When more than one of these lists has values genes are ordered according to the values of the last list added to the Venn diag
112. the Number of Clusters checkbox since it automatically uses the number of classes in the current classification Test x Additional Random Starting Clusters Enter a number to make clustering as tight as possible by performing clustering several times each time starting from a different random grouping of genes and choosing the best result The default value is 5 Discard genes with no data in half the starting conditions Discard any genes with no data in at least half the conditions in the selected experiment Animate display while clustering Show changes in classification assignments in real time This may slow your analysis slightly Saving k means Results When the k means operation is complete the Choose Classification Name window appears Choose Classification Name DER K means clustering of gene list all genes based on the following interpretation s interpretation Yeast cell cycle time series no 90 min Default Interpretation mode Log of ratio weight 1 0 The _ parameters used Number of clusters 5 Number of iterations 100 Similarity Measure Standard Correlation Did not test any additional random clusters Additional clustering notes Converged yy Save Classification As Gene Classification E YALOO1C Set 1 YALOD2W Set 1 C Gene Lists YALOO3W Set 2 YALOO4VY Set3 YALOOSC Set 3 YALOO C Set 2 YALOOBSVW Set1 YALOOSVY Set1 YALO10C Set1 YALO11VY Set 2 YALO1 2 Set 2 YALO13vV Set 2 YALO14C Set1 Y
113. the argument name and delete the sample text value and leave the Default Value field blank 3 Select the separator between the the argument name and the argument value This is either or 4 Some versions of Windows will not correctly match up argument name value pairs and will read values as new arguments To avoid this check the Fill in missing argument values box and enter a filler term in the provided text box This filler term should be something the external program will ignore or a character that doesn t occur normally such as ASCII 255 5 Click Save To remove an argument select its line in the table and click Remove Argument Running an External Program 1 Right click the program in the GeneSpring navigator and click Run 2 If your program takes the data from a tree or a classification as input be sure these are selected and visible as well 3 Open the external program folder in the navigator panel and click the program to run Examples External Program Interface Example SAS for Windows This example demonstrates how to use GeneSpring s external program interface The External Program Interface exports GeneSpring experimental data runs a SASTM program to analyze it and brings the results back into GeneSpring for display This example was developed with Windows 2000 using SASTM version 8 It should work with earlier ver sions of Windows but earlier versions of SASTM require some modifications This par
114. the change between each pair of elements of a Do this for each pair of elements that would be connected by a line in the graph screen The value cre ated between two values a and aj 4 is atan aj 4 aj n 4 Do the same to make a vector B from b Result A B A B 6 10 Analyzing Data Working with Gene Lists Upregulated Correlation Make a new vector A from a by looking at the change between each pair of elements of a Do this for each pair of elements that would be connected by a line in the graph screen The value cre ated between two values aj and aj is max atan aj 4 aj 7 4 0 Do the same to make a vector B from b Result A B A BU Pearson Correlation Calculate the mean of all elements in vector a Then subtract that value from each element in a Call the resulting vector A Do the same for b to make a vector B Result A B A BU Distance Distance is not a correlation at all but a measurement of dissimilar ity Distance is the measurement of Euclidian distance between the expression profile for gene A defined by its expression values for each point in N dimensional space where N is the number of condi tions with data in your experiment and the expression profile for gene B Result a b divided by the square root of the number of condi tions with data Spearman Correlation Order all the elements of vector a Use this order to assign a rank to each element of a Make
115. the same exper iment can be normalized in different ways You have the option of applying most normal ization step only to specific samples in your experiment To do this 1 Check the Apply Only to Specific Samples box A list of samples in the current experiment appears If these samples are named the names appear as sample identifiers If they are not named by default each sample is named for the file it is from possibly including the column name if there is more than one sample in a file Samples that cannot be normalized i e samples with no normalized column appear grayed out in the list and cannot be selected 2 Check the box of any samples to which you want to apply this step 3 Click OK to add this step to your normalization scenario or Cancel to quit without adding this step Using the Experiment Normalizations Window To access the Experiment Normalizations window select Experiments gt Experi ment Normalizations 5 2 Normalizing Data Experiment Normalizations iment Normalizations Yeast cell cycle time series no 90 min lorderorNormatizatonstoPertorm H 0s Per Gene Normalize to median Start with pre normalized values Button to add a J pata Transtormation SAGE transform normalization Data Transformation Real Time PCR transforrr Data Transformation Subtract background bast step Data Transformation Set measurements less t List of normalizations Use Defaults Data Transformation Transform from lo
116. the window This creates a screenshot of your window you will hear the sound of a snapshot The screenshot is saved on your hard drive with the name Picture 3 Open the picture and print 9 4 Exporting GeneSpring Data Exporting Gene Lists Exporting Gene Lists You can make gene lists and annotated gene lists available to another application An annotated list includes functional descriptions as well as standard deviation standard error and other information associated with the gene list Dragging Lists out of GeneSpring You can drag a gene list out of the navigator to your desktop or directly into Microsoft Excel If dragged to the desktop the gene list is saved as a zip file If dragged into Excel it appears in columnar format The resulting list contains only the gene identifiers and associated values Dragging and dropping a gene list does not produce the same list as does the Copy Anno tated Gene List function Copying Gene Lists There are two methods for copying a gene list Method 1 1 Select the gene list to copy from the Gene Lists folder in the navigator 2 Select Edit gt Copy gt Copy Gene List 3 Paste the list into another application such as a spreadsheet program Method 2 1 Open the Gene List Inspector Double click a gene list or right click and select Inspect 2 Click Copy to Clipboard 3 Paste the list into a new application Copying Annotated Gene Lists 1 Select the gene list
117. to be clustered together Viewing a Tree 1 From the navigator open the Gene Trees or the Condition Trees folder 2 Select a tree If there are no trees available for viewing you must create one See Gene Tree Clustering Options on page 7 5 Selecting and Viewing Subtrees A single green line ending in a gene is a branch of the gene tree Each bar crossing a set of branches forms a node of the intersecting branches The distance from gene X to the node connecting it to gene Y indicates how closely the genes X and Y are correlated The shorter the distance the higher the correlation is Select any node by clicking over its intersection with your cursor All the genes associated with that node changes to your selected color To create a new tree from a node of a larger tree select a node as described above then right click in the genome browser and select Make Subtree from the pop up menu 4 52 Viewing Data Tree View To make a gene list from a subtree select a node as described above then right click in the genome browser and select Make List from Subtree Eisen Like Tree View 5 Full GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help HCl Gene Lists Experiments H Gene Trees HOJ Special Gene Trees Yeast cell cycle time serie Classifications FC Pathways Ha Array Layouts Ha Expressi
118. to new genomes created through the column edi tor Common Name Optional One Adds a common name column to a genome if itis being newly created Flags Optional Any Specifies the letter or number indicating Present Absent and Marginal calls You can have as many Flag columns as you have Signal columns Region Optional One If your experiment uses multiple arrays or sections of arrays that must be normalized separately this column tells GeneSpring the region of the array and or which array a par ticular gene reading came from 2 Click Guess the Rest GeneSpring attempts to label the remaining columns If the labels are incorrect click Clear Guess to remove the column labels and select them yourself 3 Click Advanced Options if any of the following are true The gene identifiers in your experiment files have a prefix or suffix that must be stripped Your signal and control values are in separate files You want to apply a default normalization scheme to your experiment files From this screen you can select the appropriate options Working With Experiments 3 11 Using the Column Editor 5 Advanced Options Gene Identifier Prefix and Suffix Removal Ifthe Gene Identifier appearing in the data file is not identical to the Systematic Name column ofthe Master Table of genes GeneSpring can remove a prefix or suffix from it c et C There is a fixed prefix to be removed C There is a prefix t
119. to the gene values resulting from the previous normalization steps which may or may not be equivalent to the raw measure ments To limit by a cutoff 1 Check the Use only measurements with box 2 Select whether to limit by raw signal or current normalized values from the pull down menu 3 Enter the cutoff figure in the text box The default value is 10 0 Note You cannot perform this normalization and normalize to the median of each gene because they address the same issue Normalize to Median This per gene normalization accounts for the difference in detection efficiency between spots It also allows you to compare the relative change in gene expression levels as well as display these levels in a similar scale on the same graph GeneSpring uses the following formula to normalize to the median for each gene signal strength of gene A in sample X median of every measurement taken for gene A throughout your experiment If the median of the gene s measurements is below the specified cutoff value the cutoff is used instead This cutoff can be in either raw or partially normalized units The Raw Signal option means that the cutoff is applied to the raw measurements in the original data file These measurements are back calculated based on the previous normal ization steps Rounding errors may be introduced in this process Partially normalized means that the cutoff is applied to the gene values resulting from the previou
120. to view its contents At the bottom of the screen regardless of which tab is active are three but tons Click OK to save your changes and exit Click Cancel to exit without saving Click Help to view available documentation for the Gene List Inspector The Gene Lists Tab This tab displays a table of all the genes included in the selected list Double click a gene or cell in this table to view a Gene Inspector window for the selected gene See The Gene Inspector on page 4 10 for information on the Gene Inspector window Click on any col umn header in the displayed gene list to sort the table by the values in that column From this tab you have the following options Configure Columns Allows you to select which columns to display on the Gene Lists tab You can choose from any of the columns in your Master Table of Genes except the Sequence column e Save to File Allows you to save the entire gene list as a plain text file Print List Prints the selected gene list Viewing Data 4 21 Inspectors Copy to Clipboard Copies the contents of the gene list to the clipboard You can then paste the list into another application such as a text editor Find Regulatory Sequences Opens the Find Potential Regulatory Sequences win dow with the current gene list pre selected This button is available only if the genome is fully sequenced For more information on this window see Regulatory Sequences on page 6 18
121. two separate lists Genes being compared are listed along respective graph axes The correlation between any two genes is shown by a colored square at their point of intersection Strong correlations in expression level are shown by a higher intensity color weak correlations by a lower intensity color Associated values for gene lists are shown as lines extending perpendicularly from each axis The length of the line represents the magnitude of the associated value You can view these associated values by zooming in on the ends of the lines Full GeneSpring Yeast Genes ACGCGT in all ORFs File Edit View Experiments Filtering Tools Annotations Window Help 990971 970 962962959954 952 952947946941 941 ES Gene Lists HA EC PCA Yeast cell cycle time HOJ PIR keywords Simplified Gene Ontology X all genes 8 all genomic elements i SPACGCCT in all ORFS g like YMR199V CLN1 0 H Experiments H Gene Trees H Condition Trees HC Classifications H Pathways Correlation C3 Cell growth and division Ee mitosis HOJ Array Layouts HOJ Expression Profiles HOJ External Programs Colored by Correlation YMRO 73C YDR511WW YBROb63C 0 YFLOBOC YNL334C YBR259W 0 YGR241C YOR154W YLR227C YOR285W YGLOB2W TT UGU P Gene List ACGCGT in all ORFs 556 Show All Genes Zoom Out Zoom Fully Out Horizma
122. viewed as a replicate or color coded These assignments are an extremely important preparation for any type of data analy sis For information about changing experiment interpretations see Experiment Inter pretations on page 3 39 Step 4 Annotate your genome optional Most researchers will want to import the maximum amount of biological information available about each gene before beginning analyses After collecting the data it is a good idea to make lists of genes based on appropriate keywords 1 Select Annotations gt GeneSpider 2 Select a database from which to update your annotations Welcome to GeneSpring 1 9 GeneSpring Basics 3 Select the column in your master gene table that contains the accession number usu ally Column 10 for the GenBank locus Make sure there are accession numbers in the column you select 4 Click Start the GeneSpider may continue gathering information for many hours 5 Click Save and close when the GeneSpider is finished For details on the GeneSpider see Annotation Tools on page 6 27 At this point your data are ready to work with 1 10 Welcome to GeneSpring Basic Actions Basic Actions Once you have loaded your data GeneSpring opens a window containing information from your new genome Initially all the genes in your experiment are displayed To see your new genome select File gt Open Genome or Array and choose your genome from the pop up list 55 Full
123. when there are too many genes in the genome browser to easily identify individual genes Performing a Simple Search 1 Select Edit gt Find Gene The Find Gene in View window appears 2 Type a synonym systematic name or common name of a particular gene in the Find Gene window text box 3 Check any or all of the checkboxes for the fields to search 4 Click Find or press the Enter key In some views the genome browser zooms in on the found gene This gene is automati cally selected If more than one gene is found that matches your search criteria the total number of genes found are listed at the bottom of the Find Gene Window along with the number of genes found that are visible in the current gene list At this point click Find Next to show the next gene that matches your search criteria Performing an Advanced Search 1 Select Edit gt Advanced Find Gene The Advanced Find Genes window appears 5 Advanced Find Gene DER Search Fields Search For C I C Al earch For ll heck All Clear All V Systematic Name iv EC M Keywords iv Type v RefSeq v Common Name Iv Description v PubMedID Iv DBid Iv UniGene Reset M Synonym M Product v Custom Field 1 GO biological process Close V Notes V Phenotype V Custom Field 2 v GO molecular function Help V GenBank Accession No V Function IV Custom Field 3 GO cellular component Map Location Options Chromosome No Any v FirstBasepair No I Case Sen
124. wildcard character 2 Select the fields to search To select a file check the box next to the desired field in the Search In panel To unselect it click in the box to remove the check You can select as many or as few fields as you like The available choices are Working With Experiments 3 27 The Sample Manager Sample Name Notes Sample Attributes Experiment Parameters Parameter Values In the Options panel select any additional search features You can choose from the following Case Sensitive search only for words using the specific capitalization you entered Whole Words Only search only for whole words matching your keyword For example if you search for the string statistic the search results will contain sam ples that contain the word statistic in the specified fields but ignore those con taining statistic as part of a larger word such as statistician e Use as a Wildcard search for samples containing the specified keyword plus any other characters For example you might want to look for samples that are named using a prefix with sequential numbers appended to the end To do this you would enter the prefix followed by an asterisk e g ex Click Search Any samples matching your search term appear in the Filter Results list Working With Experiments 3 28 Experiment Parameters Experiment Parameters Parameters are the variables you use to describe you
125. within the ORF by using negative numbers for the bases 5 Enter the promoter sequence in the Sequence box 6 Enter the number of single point discrepancies allowed This refers to a maximum number of mismatches allowed For example if you specify one single point discrep ancy ACGCGAT satisfies a search for ACGCGTT 7 Select whether the sequence is relative to the sequence upstream of other genes or rela tive to the whole genomic sequence The first option is far more common The Probability Cutoff textbox indicates the level of significance P value needed for an oligomer to be listed in the results You can change this value Analyzing Data 6 21 Working with Gene Lists 8 Specify whether to perform the operation locally or on a remote GeNet server 9 Click Start The button becomes a Stop button The progress bar lengthens as your search progresses For very large genomes or complex search parameters this opera tion may take a few minutes Viewing Regulatory Sequence Search Results The search results are displayed in the Results area of the Find Potential Regulatory Sequences window Click Details for expanded results data Click View Genes for Selected Row or double click any sequence to view the Conjectured Regulatory Sequence window Sequence The nucleotide sequence of the oligomer e Observed The number of genes in the list where the oligomer was found e P value The probability P Value that the n
126. yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeast timeseries t Type Your Name Jun 18 2001 10 2 Silicon Genetics All Samples are displayed Remove All C Configure Columns Selected Samples 2 Samples Creation Date Upload Date yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics Filter Results list yeast timeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics Publish to GeNet Delete eate Experiment Edit Attributes Use samples stored Locally bd Display samples from the Samples to include in local machine GeNet or both the new experiment Figure 3 13 The Create New Experiment window From this screen you can select samples to add to your experiment or choose a filtering method to narrow the samples The left side of the screen contains a tab for each filtering method The available methods are Show All Display all available samples without applying a filter Filter on Experiment Display samples associated with a particular experiment Filter on Attributes Display samples based on selected attributes Filter on Keyword Display samples containing a keyword Filter on Parameter Display samples based on the parameter values of experiments containing them Click on the appropriate tab to view the options for that filtering method For detailed information
127. you checked the box to group pathways into subfolders the gene lists will also be grouped by subfolders 5 Click OK 6 In the screen that appears accept the default or enter a new name for the folder of path ways 7 Click OK Saving pathways may take several minutes Adding a Gene to a Pathway Once you have successfully imported your graphic into GeneSpring you can place genes on top of the background image 1 Open the appropriate Pathway in the navigator 2 While holding down the Ctr1 key draw a box where you would like the gene to appear on the pathway Macintosh users Option click The New Genes on Pathway window appears 3 Enter the gene name accession number or keyword such as a word in a gene s descriptor and click OK The gene name appears on the pathway If the gene name or keyword is present for more than one gene another window appears directing you to choose a gene ID from a list Double click on the appropriate ID To remove a gene right click on the element and select Delete Pathway Element Finding New Genes on a Pathway GeneSpring uses proprietary algorithms to predict the genes that fit near a selected point on a pathway When you select a point GeneSpring makes two lists of genes from those Analyzing Data 6 17 Working with Gene Lists currently displayed on your diagram List A contains the two genes that appear closest to your selected point on the diagram List B contains all
128. you have typed in and the Filter Results table Filter on Annotation This method allows you to filter genes based on text or values in a specified annotation Filter on Gene List Type a List Filter on Annotation Show all Search for Search in ae Check All ClearAll v PubMediD MV Systematic Name v Custom Field 1 iv Common Name Iv Custom Field 2 V Synonym IV Custom Field 3 v Options Case Sensitive I Whole Words Only MV Use as WildCard Search Reset Figure 6 4 The Filter on Annotation tab To filter on annotation 1 Enter a search term in the Search For text box 2 Select the annotation fields to search in You have the following options Systematic Name Common Name Synonym Notes GenBank Accession Number EC Description Product Phenotype Function Keywords PubMed ID Custom Fields 1 3 6 4 Analyzing Data Creating and Editing Gene Lists Type DBid GO Biological Process GO Molecular Function GOCellular Component RefSeq UniGene Any custom annotation fields you may have added Some fields may not be visible due to window size To view all fields scroll down in the Search In panel To check all boxes click Check A11 To uncheck all boxes click Clear All Select any desired search options The available choices are e Case Sensitive Use Whole Words Only Use as WildCard Click Search Genes matching the specified search parameters appear in the
129. you imported the Signal to Control Ratio and the Con trol Channel Enter the value below which you do not trust the control signal in the Cutoff text box By default this value is 10 0 Intensity Dependent Normalization Intensity dependent normalization often called non linear or LOWESS normalization is recommended for use in most two color experiments This step can be applied only to chips with more than 100 genes LOWESS normalization uses region designators in the same way that other per chip normalization methods do For details see Region Normal ization on page 5 12 Intensity dependent normalization is a technique that is used to eliminate dye related arti facts in two color experiments that cause the Cy5 Cy3 ratio to be affected by the total intensity of the spot This normalization process attempts to correct for artifacts caused by non linear rates of dye incorporation as well as inconsistencies in the relative fluorescence intensity between some red and green dyes Such artifacts often result in a curve in the graph of raw versus control signal see panel A in Figure 5 2 In the absence of bias one would expect there to be no dependence of raw signal on con trol signal and thus the data points would be scattered symmetrically around the 45 line NORMALIZED 10000 100 1000 ADJUSTED CONTROL 100 1000 CONTROL Figure 5 2 The Effect of Intensity Dependent Normalization GeneSpring s intensit
130. 0 c ccc cece teen eens 4 4 Physical Position Display Options 0 eee e eee 4 43 Scatter Plot View 4 seo lee gehe pea acies eden 4 45 Scatter Plot Display Options 0 000 cece eee eee 4 45 3D Scatt r Plot View sys ies ida eh eee ea ab 4 49 3D Scatter Plot Display Options 00 e cece eee eee 4 49 Tree VIEW Li oh dena esie od ten vers Baa bd utis Pas beard sk sow the 4 52 Viewing a Treek x ost Cd we e ahs Gee REINO tgp sanete gr ce ahs 4 52 Selecting and Viewing Subtrees 20 0c cece ee ee eee 4 52 Magnifying Trees 0 0 0 eects 4 55 Viewing Ndes ce roD e t Sn ee oe Rue 4 55 Viewing Gene Names in Trees 0 eee eee 4 56 Viewing Parameters in Trees eee 4 56 Tree Display Options ynrin a e eia rE ERE eee 4 56 Ordered List VieW occ E oe EADE PEE ea a RAEES 4 58 Ordered List Display Options 0 0c cece ee ee eee 4 58 Array layout View s cot edet E oe AO aw A 4 60 Array Layout Display Options 0 0 0c cece eee eee 4 60 Pathway View o2 eea ee E es be ree teres 4 62 Pathway Display Options 00 0 0 4 62 Compare Genes to GeneS 0 0 cece eens 4 64 Graph by Genes View eron rarna 00 0 c eee cence EEN 4 65 Graph by Genes Display Options 00 20 e eae 4 65 View as Spreadsheet 0 0 0 ccc cece ene 4 67 Condition Scatter Plot 0 0 eee cette eens 4 68 Condition Scatter Plot Display Options
131. 0 70 80 100 110 120 130 140 150 160 Figure 3 26 The gene list like CLNI graphed using the log ratio formula Note that in Log of Ratio interpretation the lower limit of the vertical axis is 0 01 Any expression values below 0 01 are plotted as 0 01 Note also that when you export your Working With Experiments 3 41 Experiment Interpretations data GeneSpring reinterprets the data as the ratio Measurements below 01 are exported as 01 Fold Change Fold Change mode creates a more balanced visual representation between over and underexpressed genes than Ratio mode and emphasizes the increase and decrease of expression levels For example x1 would refer to normal expression x2 to an expression level twice normal and 2 to an expression level half normal When using the upper or lower bound fields to change the vertical axis range enter either the ratio values in inte gers or the fold change value i e x4 or 4 Any integers you enter are converted Normalized Intensity Yeast cell cycle time series no 90 min x47 fold scale time minutes 0 10 20 30 40 50 60 70 80 100 110 120 130 140 150 160 Figure 3 27 New Fold Change Image Note that in Fold Change interpretation the lowest measured value is 0 01 Any values below 0 01 are calculated as 0 01 The minimum display value is 100 Note also that when you export your data GeneSpring reinterprets the data as the ratio Measurements below 01 are
132. 19 formula notation B 2 G GATC A 3 GenBank files 2 6 adding genes to 2 7 gene inspector 4 10 control 4 11 description 4 11 normalized 4 11 notes 4 13 raw 4 11 save profile 4 13 Student s t test 4 12 t test p value 4 11 web connections 4 13 gene list editor filtering methods 6 3 gene list inspector 4 20 gene lists copying 9 5 creating 6 2 dragging 9 5 editing 6 2 exporting 9 5 filtering methods 6 3 find similar 6 6 gene similarity B 2 genes find similar 4 12 GeneSpider 6 27 lists from annotations 6 12 updating master gene table with 6 27 GeneSpring navigator 1 13 GeNet deleting from 9 12 upload to 9 11 Index 3 GeNet database 1 5 genomes creating 2 2 2 9 creating from experiment data 2 9 definition 1 6 opening opening genomes 1 16 graph by genes view 4 65 graph view 4 37 H help ScriptEditor 8 18 help menu about 1 5 SiG on the web 1 4 system monitor 1 5 version notes 1 4 hidden elements 3 31 homology tool 6 24 housekeeping genes 5 11 icon legend 8 11 import data by pasting 3 17 import data command 3 3 importing parameters 3 32 3 35 inspect condition 4 18 experiment 4 16 gene 4 10 inspectors classification 4 22 external program 8 42 gene list 4 20 script 8 5 installing from CD 1 2 installing from the Web 1 2 interpretations 3 39 interpreted data definition 1 7 IUPAC IUB ambiguity codes 4 5 4 6 Index 4 J JDBC driver 1 18 K k means minimum number B 8 k means clu
133. 20 140 160 Y axis Yeast cell cycle time series no 90 min Default Interpretation Colored by time 0 minutes Gene List all genomic elements 7216 es Zo Qut Zoom F t Magnification 1 Figure 4 21 The Graph view The figure above shows the genes in the all genes list in Graph view The gene in white has been selected its name appears in the legend after the name of the gene list Graph View Display Options The following display options are available in graph view Vertical Axis See The Vertical Axis on page 4 27 Features The available options for this view are listed below Lines to Graph The available options for this view are listed below e Coloring See Color on page 4 30 Error Bars See Error Bars on page 4 28 Legend See Legend on page 4 28 Viewing Data 4 37 Graph View Lines to Graph You have the option to draw grid lines to help distinguish distinct groups of data points To modify the use of lines l Select View gt Display Options or right click anywhere in the genome browser and select Display Options Click the Lines to Graph tab To see a grid inside the plot area you can have lines drawn at the major and minor tick intervals of each axis Check any of the the Major Intervals Minor Inter vals boxes and click Apply to view your data with grid lines Features The Features panel of the display options window contains a column of check boxes
134. 3 Filtering Methods 0 cece cece eee 3 25 Experiment Parameters oesters i a a E ees 3 29 Parameters Displayed in the Navigator 0000 3 29 A Note on Multiple Parameters 2 2 0 00 0000 eee eee eee 3 30 Parameter Display Options 2 00 0 cece eee eee eens 3 30 Hidden Elements 00 ccc eee eens 3 31 Continuous Element sese 3 31 Non Continuous Element 22 0 0 0 0 00 cee eee eee eee 3 31 Color Code ioo EE bch dat den Aeneerte db asi bend db 3 31 Changing Experiment Parameters 2 0 00 esee 3 32 Sample Attributes 0 0 0 cee eee nee 3 35 Changing Experiment Attributes 0 2 0 0 0 eese 3 35 Editing Standard Attributes 00 0 0 cece cee ee 3 37 Experiment Interpretations llle 3 39 Vertical Axis Modes 0 cece eee ees 3 40 Parameter Display Modes 00 00 cece cece eee ee eens 3 43 Cross gene Error Models 0 0 0 cece cece ees 3 44 Using the Cross gene Error Model 0 0020 e eee 3 44 Technical Details e roter ce e e RC 3 46 Viewing Data Using the Genome Browser 00 cece eee ene en ees 4 2 Zooimimsg In 2 265 ec and beet ete te ho Rap ew d edo sh Bree 4 2 Panning oseva Ed epe a RN HR RR RC AC ANE RUE X C A INI RUA 4 2 Modifying Display Options 0 00 0 eee ees 4 2 Displaying a Gene List 0 2 0 0 0 eens 4 3 Finding and Selecting Genes 0 00 c ccs 4 4
135. 30 Legend See Legend on page 4 28 The Horizontal Axis Tab Sort by Gene List Sorts genes in their order in the gene list by their associated num bers if they exist otherwise by their order in the Master Table of Genes Viewing Data 4 65 Graph by Genes View Set Gene List Specify the gene list by which to sort genes This button is only active ifthe Sort by Gene List radio button is selected Sort by Condition Normalized Data Sort genes in the order of their normalized values within the selected condition Sort by Condition Raw Data Sort genes in the order of their raw values within the selected condition Sort by Condition Control Data Sort genes in the order of their control data within the selected condition Set Condition Specify the condition by which to sort genes This button is only active if one ofthe Sort by Condition radio buttons is selected The Features Tab The Features tab of the display options window contains a column of check boxes that allow you toggle on or off certain items in the genome browser To use them select View Display Options orright click anywhere in the genome browser and select Display Options Click the Features panel and chose from any of the following Plot Symbol Using the Style and Size pull down menus specify the symbol with which to display each gene If the Line option is selected individual genes cannot be selected in the Genome Browser w
136. 41 bp DNA PLN 25 NOV 1996 CCCACACACCACACCCACACCACACCCACACACCACACACACCACACCCA AGTGTGTGGGTGTGGGTGTGTGGGTGTGGTGTGTGGGTGTGGTGTGTGTGTGGTGT GTGGGTGTGGGTGTGTGGGTGTGGTGGGTGTGGTGTGTGTG Name multiple chromosomes sequentially i e CHR1 CHR2 and so on If there is only one chromosome name it CHRI 2 8 Creating Genomes Creating a Genome from Experiment Data Creating a Genome from Experiment Data In GeneSpring a genome includes all the genes on your chip When you create a genome from experiment data GeneSpring creates a genome on the fly based on genes in your experiment data files This means that unlike a genome created in the New Genome Instal lation Wizard this genome has no annotations and no means of obtaining annotations from public databases The genome consists of a master table of genes and a genome definition file If you create anew genome after accepting a file format recognized by GeneSpring anything not stan dard to that recognized format is not included in the master table of genes The master table of genes contains all the information associated with genes in a given genome For example if GeneSpring recognizes an Affymetrix file but that file has GenBank accession numbers the numbers are not loaded You can add these numbers later to the GenBank column of the annotations file If your data files have a description column GeneSpring includes it in the master gene table Clontec
137. 5 3 re order steps 5 4 repeated measurements 5 18 reserve control channel 5 8 set measurements 5 6 start with pre normalized values 5 6 to constant value 5 12 to median 5 15 to median or percentile 5 10 to positive control genes 5 11 transform from log to linear 5 7 two color microarray 5 17 normalizations window 5 2 normalizations default 3 21 normalized data definition 1 7 O ODBC A 2 one color experiments 4 31 ontology building 6 31 ordered list view 4 58 over expressed color changing 1 19 P panning 4 2 Parameter Interpretations fold change 100 is 1 50 is 1 3 42 log ratio 3 41 ratio 3 41 ratio of signal control 3 41 Parameters non numeric 3 31 3 34 numeric 3 31 3 34 order 3 34 parameters 3 29 changing 3 32 3 35 color code 3 31 continuous element 3 31 creating new 3 33 3 36 definition 1 7 deleting 3 33 3 36 display options 3 30 hidden elements 3 31 importing 3 32 3 35 non continuous elements 3 31 non numeric 3 18 numeric 3 18 replacing 3 33 3 36 values 3 29 Pathway view 6 16 adding new elements 6 17 pathway view 4 62 Pearson correlation 6 11 per gene normalizations 5 13 percent explained variability 4 23 per chip normalizations 5 10 per spot normalizations 5 7 physical position view 4 41 Index 5 picture secondary 4 71 predictor 7 24 preferences 1 18 browser 1 21 color 1 18 data directory 1 18 data files 1 18 database 1 18 default correlation 1 23 default font 1 24 defau
138. 7 ScriptEditor Concepts 00 0 c eee eee ee eens 8 7 The ScriptEditor Interface 0 0 ce eee eee ee 8 7 The Toon Legend eis ea leew eae eee EG 8 11 The Properties Panel 0 0 0 0 0 0 c ccc cece ene 8 12 Creating Scripts s ous te a tick Bek ek e COO Sad Re CR ESS 8 13 vi Building Scripts seses cerris eee eee eee eee 8 16 Saving SCHpts Loco eme wind eit Gl OR RING NIKE deere 8 18 Warning Messages lese 8 18 Seript Help o I nue Een SurIese MS sate dead wehbe eas 8 18 Script Building Blocks 8 20 Scripts to External Programs 0 0 cece eee eee eee 8 32 Scripts and External Programs 00 00 cee eee eee eens 8 34 External Programs i 02 dehet Hop uer aste Se Wis ee doa M see ol 8 35 The New External Program Window 00 0 0 e eee 8 35 Examples ewe ULM Deer NU ENDE UNTEN CERA 8 40 The External Program Inspector 00 0 eee eee ee eee 8 42 Exporting GeneSpring Data Saving Images a e oa a o ccc tenet me 9 2 Saving Pictures and Printing 0 0 0 cece eee eee 9 4 Exporting Gene Lists 0 0 eee eee eee eee 9 5 Exporting MAGE ML Data 00 cece eee eee 9 8 Publishing Data to GeNet 0 0 0 0 cette eens 9 11 Uploading Data Objects to GeNet nununnna nnana 9 11 Deleting Data Objects from GeNet 0 0 00 cece ee eee 9 12 Uploading Genomes to GeNet 000 cece eee 9 12 Installing from a Database Custom D
139. 7 0 662 E ACGCGT in all ORFs From po 36 soo bases upstream of each g 4709 6127 7 874e 108 R like YMR199W CLN1 0 3334 6127 0 133 From 5 To 8 length oligonucl 3644 5127 1 785e 38 3890 6127 8 822e 13 Allow Ns in Regulatory Sequence 4255 5127 5 954e 58 a TIR 2 2855 5127 1 893e 10 From 0 To 0 single point discrepancies Navigator 4881 6127 3 088e 268 0 Ns in the exact middle 3402 16127 1 803e 30 AAGCA 3191 68127 2 884e 44 AAGCC 1836 6127 0 001 Ta bs elative to upstream of other genes AAGCG 1777 6127 0 2196 PIRO o Wfiole Genome AAGGC 2138 6127 2 657e 47 Probability Cutoff 0 05 AAGTG 2643 6127 0 142 v Do local nucleotide density correction ACA 4027 6127 Toe Results ACAAG 2827 6127 8 303e 15 ACAGC 1914 5127 5 952e 7 r Computation Preferences ACTGC 1859 6127 1 556e 7 Compute locally Compute on a GeNet RemoteServer ACTIT 3928 8127 3 37e 13 AGAGA 2863 6127 2 518e 20 Local run time estimate Seconds AGAGC 1873 5127 5 974e 11 Stt Help Figure 6 11 The Find Regulatory Sequences window Progress Le From this screen you can do the following Find new sequences This option searches for short sequences upstream of the genes in the current gene list or across the entire genome Entera specific sequence This option allows you to enter a known sequence To find a new regulatory sequence 1 Select Tools gt Find Potential Regulator
140. 8 23 input 8 7 look up 8 24 make groups 8 25 merge split groups 8 24 numbers 8 26 output 8 7 select groups 8 25 statistical analysis 8 31 script inputs delete 8 17 move to end 8 17 move to start 8 17 script output boolean 8 14 numbers 8 14 sequence information 8 14 script outputs delete 8 17 move to end 8 17 move to start 8 17 script primitives 8 7 ScriptEditor 8 7 blocks 8 16 browser 8 9 building blocks 8 13 8 20 change information 8 12 concepts 8 7 help 8 18 icon legend 8 11 inputs 8 13 knobs 8 14 move to end 8 17 move to start 8 17 navigator 8 8 notes 8 9 outputs 8 13 saving scripts 8 18 support 8 18 warning messages 8 18 scripts 8 2 8 3 8 7 2 fold expression change 8 2 best k means 8 2 blocks 8 16 building blocks 8 13 8 20 change information 8 12 clustering 2 fold change list 8 2 definition 8 2 filter on noise 8 2 find similar genes 8 2 help 8 18 input 8 7 inputs 8 13 inspecting 8 5 knobs 8 7 8 14 output 8 7 outputs 8 13 pairwise comparison 8 2 parameters for data file restriction 8 4 PEER S 8 2 predefined 8 2 remote server 8 5 running 8 3 saving 8 18 select k means 8 2 send clustering results to GeNet 8 2 series of k means 8 3 sockets 8 7 support 8 18 to external programs 8 32 warning messages 8 18 search simple 4 4 secondary picture 4 71 select a gene s deselect a gene 6 15 selecting genes 4 4 self organizing maps 7 8 separation ratio 7 8
141. ALO15C Set 4 YALO1 BY Set3 Save Cancel Figure 7 6 The Choose Classification Name window Classification J Classifications To save your results 1 Enter a name in the Name field at the top of the screen Names may not exceed 80 characters To save the results as a classification select the Classification radio button To save the results as a group of gene lists select the Gene Lists radio button From the navigator select a folder in which to save the new classification or gene lists To create a new folder navigate to the desired parent folder and enter a new folder name in the Folder field Enter any additional information in the Notes field if desired 7 10 Clustering and Characterizing Data Clustering Methods 5 Click Save Viewing k means Clusters If you use k means clustering to produce a classification you can view details about the classification in the Classification Inspector For information about the Classification Inspector see The Classification Inspector on page 4 22 The easiest way to view a classification is with the Split Window feature Right click a classification or a gene list created with k means clustering and select Split Window gt Both The genome browser splits into several smaller displays You can also choose vertically or horizontally Self Organizing Maps The self organizing map SOM is a clustering technique similar to k means clustering Howe
142. ANI Silicon Genetics GeneSpring User M anual version 6 1 November 14 2003 Copyright 2003 Silicon Genetics All rights reserved GeneSpring GeneSpider GenEx GeNet MetaMine ScriptEditor and MicroSift are trademarks of Silicon Genetics All other products including but not limited to Affymetrix GeneChip Affymetrix Global Scaling GenBank Microsoft Excel Microsoft Notepad Pico SimpleText and Adobe FrameMaker are the trademarks of their respective holders Table of Contents 1 Welcome to GeneSpring Getting Statied i Lo LLL Ll ete tae the e MU tee sets 1 2 Learning To Use GeneSpring leleeeeeeeeee 1 4 GeneSpring Basics 0 0 ccc cence eens 1 6 Data loadings ep uere e Oe cebat ts 1 8 Loading Your Data llle ees 1 8 Basic ACUOIS o suene ehem E ebbe ems de 1 11 Commonly Used Functions 0 0 00 ccc eee eee 1 16 The Gene Inspector Window 0 0 cece cece eee eee 1 16 Making Lists 4e ose ees oA AM Sih crue en Rd 1 16 Setting Preferences c ounce unc nno dh Served wena RS 1 18 Data FIGS a5 sax 2i ee ee ere deris ede rese te Bedae 1 18 Database ca oer OU eo ete ERR ACERO 1 18 COLOR hatin ee AMPH LEE eite toe devoted 1 18 Gene Labels ec essei ER tees a aw UAE RE 1 21 BOWSER eese seti pate fp qe soya pe eee scie d 1 21 Firewall 63st utei ttai epithe aqh Siete Duet at nash 1 21 SYVSLOM 401 sak pale eal dete wat nas Sate ee ue AR 1 22 GENET a ss rs EAEE bcd rte ei cel
143. Absent If measurements are limited by a cutoff the percentile is calculated from all measurements above the cutoff This cutoff can be in either raw or partially normalized units The Raw Signal option means that the cutoff is applied to the raw measurements in the original data file Negative numbers are allowed These measurements are back calculated Normalizing Data 5 11 Normalization Types based on the previous normalization steps Rounding errors may be introduced in this pro cess Partially normalized means that the cutoff is applied to the gene values resulting from the previous normalization steps which may or may not be equivalent to the raw measure ments To limit by a cutoff 1 Check the Use only measurements with box 2 Select whether to limit by raw signal or current normalized values from the pull down menu 3 Enter the cutoff figure in the text box The default value is 10 0 You can choose to apply additional background correction in this step To apply back ground correction check the appropriate box in the Background Correction section of the screen You have the following options Never apply extra background correction Always apply extra background correction Prior to taking the specified percentile the bottom tenth percentile is used as a background correction and subtracted from all genes If needed apply extra background correction For samples in which the bottom tenth percentile is
144. Although these lines can represent many types of data thresholds they are generically called fold change lines These fold lines are valuable because you can select points that lie above or below them by right clicking in the appropriate position in the genome browser In addition to fold lines you can add lines to the origin of each axis as well as draw a line of best fit To modify the use of lines l Select View gt Display Options or right click anywhere in the genome browser and select Display Options Click the Lines to Graph tab To see a grid inside the plot area you can have lines drawn at the major and minor tick intervals of each axis Check the X Y Z Axis Grid Lines checkboxes that to see The color of these grid lines is represented in the Grid Color box at the bottom of the window To modify the grid color click Change 4 50 Viewing Data 3D Scatter Plot View Changing Labels and Features The scatter plot view also allows you to change the appearance of data points and data labels To modify these features 1 Select View gt Display Options or right click anywhere in the genome browser and select Display Options 2 Click the Features tab 3 To modify the size and shape of the points choose from among the options in the Style and Size pull down menus 4 There are five options for labeling the plot Show Gene Names Displays the name of each gene to the lower right of each point These na
145. Anywhere screen appears with a progress bar 2 Follow the on screen instructions For more information see the ReadMe file included with the CD In Windows you can also install the software by using the Start Run command in the Start menu Enter D gspring exe where D is the CD ROM drive on your com puter Installing from the Web If you are reading this manual and do not have a copy of GeneSpring you can download a copy by going to the following URL http www sigenetics com cgi SiG cgi Products GeneSpring download smf 1 2 Welcome to GeneSpring Getting Started Follow the on screen directions and Silicon Genetics will send you a username password and download link Starting GeneSpring Once you have installed GeneSpring the GeneSpring icon appears on your desktop E Li GeneSpring Figure 1 1 The GeneSpring icon To start GeneSpring double click the GeneSpring icon Windows users can also start GeneSpring by selecting it from the Start menu gt Programs gt GeneSpring or navigating to Program files Silicon Genetics GeneSpring and dou ble clicking the GeneSpring icon Macintosh users can also start GeneSpring from the Applications folder Silicon Genetics GeneSpring A splash screen appears containing your GeneSpring version number the expected expira tion date and the JVM you are using You will then see the GeneSpring main window For further details see GeneSpring Basics on page 1 6 Ob
146. Bulk Upload to GeNet Genome Yeast This will upload the data in your Yeast genome to GeNet Upload any data in the Yeast genome not already on GeNet C Upload all data in the Yeast genome that was created in this GeneSpring session Exclude Data Upload 834 data objects Progress Stat Close Show Details Help Figure 9 5 The Bulk Upload window Users who do not have administrator access to GeNet have the following options Upload all of the data Upload all data in the current genome to GeNet This option creates duplicates on GeNet of any data already uploaded Upload any data not already on GeNet Upload all data in the current genome that is not already present on GeNet Upload all data that was created in this GeneSpring session Upload all data in the current genome that was created during the current GeneSpring session Users who are logged into GeNet as an administrator from GeneSpring have the following additional options Upload no data objects Do not upload any data This option appears only if the active genome already exists on GeNet Upload the genome If the active genome does not already exist on GeNet upload the genome and all associated data This option appears only if the active genome does not already exist on GeNet In addition administrators have the option to upload data directly into the root directory or into their own default directory In the Destination D
147. CD28 beads gt 1 Lowest 4 HH Highest 5 tn O a A Mean values with no significant difference Unknown E a B S Ao ae El Change Colors Order Table by Pattern Cancel Help Make List Find Gene Figure 6 4 Post Hoc results summary by gene Analyzing Data 6 41 Statistical Analysis ANOVA Summary by Groups The Results Summary by Groups tab displays a matrix with rows and columns indexed by parameter values each cell corresponding to a combination of groups The numbers in the lower half of the matrix represent the number of genes that differ significantly between the groups the numbers in the upper half are the genes which show no significant difference 4 1 way ANOVA Post Hoc Testing Results CD3 CD28 co N a o e a o vial o VU Figure 6 5 Post Hoc results summary by group Analyzing Data 6 42 Statistical Analysis ANOVA 2 Way ANOVA 2 way ANOVA tests genes for significant differences across groups defined by 2 parame ters This test is appropriate to use for a 2 way design where the groups to be compared are defined by 2 parameters A 2 way design is one that can be thought of as a matrix with the rows indexed by the values of one parameter and the columns indexed by the values of a second parameter Each cell then represents the number of replicates in that particular group Ideally each cell sho
148. Column values mustbe Equalto bd v Use as a Wildcard g genes without data in s HB has data in all sample Value must appear in Atleast v 1 no columns selected i Close Help Figure 6 19 The Filter on Data File window 1 Select an experiment or interpretation from the navigator and click Choose Exper iment 2 To select the column or columns to search on check the Search box in the header of the desired column s in your experiment The column is highlighted in yellow 3 Restrict column values by choosing a value from the Column Values Must Be pull down menu and inserting a restriction value in the field provided The available choices are Less than Greater than Equal to Not equal to Contain For example if you load an Affymetrix file you can use the pull down menu to select the Abs ca11 column and search for all entries equal to M This produces a list of only marginal data 4 Specify the number of columns in which the desired value must appear Select an option from the Value must appear in pull down menu and enter a value in the field provided The non editable number to the right of this box indicates the number of columns that have been selected If you have multiple data formats this number reflects the total number of columns selected on all tabs Analyzing Data 6 62 Basic Filters Arbitrary File Restrictions This option allows you to find genes based on the information in on
149. Columns Spec ification knob Use curly brackets to indicate the file format Within the curly brackets give the column number or the column header of the column s to search The order of the curly brackets must match the order of the tabs given in GeneSpring s GUI 8 4 Scripts and External Programs Scripts The first tab is the format of the majority of samples in the experiment the second tab defines the second most common format for that experiment etc If the formats describe the same number of samples look at the GeneSpring GUI to determine the order of the formats Example Column Number or Column Namel Column Number or Column Name 2 1 3 identifier The Script Inspector You can right click any script within GeneSpring and select Inspect to examine that particular script You can double click any building block to view the building blocks that make up that block You can also reach this screen by clicking View in the Run Script window and then clicking Inspect You can also edit the notes and history of your script Click Edit to change the Author Organization Original Source Other Software Date and Note fields ScriptEditor Explained Variation properties DER Notes Compute the proportion of variation in an experiment explained by a classification and based on a gene list Inputs geene Classification giz Experiment g Gene List bs i v v Explained Variation OK Details Fig
150. Experiment Data Minimum Minimum Minimum C Experiment Data with Confidence Maximum Maximum Maximum c E ien CEES TEL Standard Error Standard Error Standard Error C Tree Standard Deviation Standard Deviation Standard Deviation C Genome Number of Samples Number of Samples Number of Samples OK Cancel Show Example Figure 8 4 The Choose Input Type window 2 Select the type of input to send to the external program The available choices are Gene List A tab delimited list of systematic names one per line 8 36 Scripts and External Programs Note External Programs Gene List With Numbers A tab delimited list of systematic names and associ ated numbers one pair per line Gene Name A single systematic name Experiment Data Normalized experiment data one line per gene one column per experiment with header lines for the experiment name and each parameter Only genes in the currently selected gene list are sent Experiment Data with Confidence Normalized experiment data one line per gene two lines per experiment one for normalized data and one for control values with header lines for the experiment name and each parameter Only genes in the currently selected gene list are sent Classification A tab delimited list of systematic names and the name of the asso ciated classification group one pair per line Only genes in the currently selected gene list and classification are sent
151. Gene List window appears which includes the genes in that list as well as lists that are similar to your new gene list 6 6 Analyzing Data Working with Gene Lists pe New Gene List 6 127 genes Normalized Data between 0 01 and 20 Yeast cell cycle time series no 90 min expression normalized data from 0 01 to 20 in atleast 8 of 16 conditions YLR365W YLR386W YLR367W Ribosomal protein 22B 824B rp50 Y822 PIR keywords YLR362W ServThr protein kinase MEKK homolog J Simplified Gene Ontology YLR363C putative Upfi p interacting protein YLR364W YLR371W GDP GTP Exchange Protein GEP for Rho1p YLR372W involved in fatty acid biosynthesis YLR373C vacuole import and degradation YLR3B8W YLR369W mitochondrial hsp70 type molecular chaperone YLR370C Arp2V3 Complex Subunit J PCA Yeast cell cycle time Similar lists Show as List Show as Navigator List Name PCA component 1 PCA component 2 PCA component 3 PCA component 4 PCA component 5 PCA component 6 PCA component 7 PCA component 8 _Save_ Cancel Figure 6 5 The New Gene List window 3 Enter a name for the new gene list or accept the default and click Save The Find Similar Genes Window This command allows you to set up complex correlations against the inspected gene These correlations may involve more than one experiment or condition or extra restric tions on experiments Analyzing Data 6 7 Working with Gene
152. GeneSpring available for use in a script Save data object to file the saving external programs output a result from a script to your local hard drive It does not save to GeneSpring or GeNet as there are already building blocks that perform that function GeneSpring checks for new changed or deleted external programs each time it is started External programs you have used in scripts cannot be deleted If you have shared a script with a colleague make sure your version of GeneSpring has all the same external pro grams as your colleague If an external program is not present in the local version of GeneSpring that is being used to run the script a message announces that script is corrupted Corrupted scripts cannot be run in GeneSpring For details on how to create external programs please refer to the Silicon Genetics FAQs and External Programs on page 8 35 8 34 Scripts and External Programs External Programs External Programs The GeneSpring External Program interface allows you to run external analysis programs from within GeneSpring These programs can be useful when your research calls for a type of analysis that GeneSpring does not perform The external program interface is also useful for parsing and pre formatting data for use in another application When you launch an external program from within GeneSpring the data that is displayed in the genome browser is sent to the external program as standard input When
153. Lists EM Find Similar Genes EHS Gene Lists H PCA Yeast cell cycle time Choose Target Gene YALOO3W HO PIR keywords choose Gene Lis ACGCGT in all ORFs HC Simplified Gene Ontology Similarity Measure Eee a 8 all genes j ii E all genomic elements S Yeast cell cycle Default Interpret Log of ratio ACGCGT in all ORFs z iz X like YMR199 V CLN1 0 FC Experiments Find genes similar to YALOO3W i Yeast cell cycle time serie 7 1 37 out of 556 genes pass filter Iv Interactive Update 72 Default Interpretation d Qs All Samples 2 Cumulative Distribution S s m El o lt S 2 m o a Li i 0 0 5 1 ER Minimum 0 51 View Cumulative Distribution Computation Preferences Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 6 6 The Find Similar Genes window You can view the preview pane as a cumulative distribution a graph or by linking the pre view to the main GeneSpring window When you are working with large experiments you may want to uncheck the Interactive Update box to the upper right of the preview pane to avoid slowing the analysis 1 Double click a gene to bring up the Gene Inspector 2 Click Complex Correlations The Find Similar Genes window appears This window can also be accessed by selecting Too1s Fi
154. LocusLink Retrieve information from LocusLink Update annotations from UniGene Retrieve information from UniGene The Update Genome window appears 2 If you selected Update genes from Silicon Genetics the window has a different appearance because more options are available If you selected any of the other options proceed to the next step To upgrade from Silicon Genetics a Check the boxes next to the annotation sources from which to retrieve data b Specify the retrieval method Select Concatenate annotations from different sources to retrieve annotations from all sources as a semicolon delimited list Exact dupli cates are not retrieved The order is fixed GenBank then LocusLink then Uni Gene Select Keep the highest priority annotation to retrieve only the annotation from the highest priority source available for each gene Analyzing Data 6 27 Annotation Tools c Proceed to step 3 3 Select the name of the column containing GenBank accession numbers from the pull down menu in the upper right portion of the screen 4 To update information in places where data already exists select the Overwrite Existing Annotations checkbox If you leave this box unchecked GeneSpring adds new information only to blank fields When you update annotations GeneSpring creates a back up file of the pre update master gene table Updating from Silicon Genetics or GenBank gives you the option to retrieve sequence data 5
155. O DMI HHO Documents and Setting a HHC GeNet HO HP Jap Selected C FOR EBI OK Cancel Figure 9 2 The MAGE ML Choose Output Directory window 4 Select the directory in which the MAGE ML export files will be saved An EBI compliant MAGE ML experiment must contain at least the following three files The XML file describing the experiment The MAGE ML dtd file which defines the MAGE ML format A plain text file containing the raw experimental data Your export may also include raw data files and or sample or array images Note Make sure the Systematic names of your genes are the same identifiers used by the chip manufacturer This eases mapping from the vendor MAGE ML file to the experiment MAGE ML file 5 Click OK The MAGE ML files are saved to the directory you specified Exporting GeneSpring Data 9 10 Publishing Data to GeNet Publishing Data to GeNet GeNet is a scalable workspace for GeneSpring users that streamlines microarray research at large or multicampus organizations It provides facilities for robust data analysis col laborative workflow management automated research procedures and secure data admin istration GeNet scales to meet the demands of both high throughput sample volumes and increasing numbers of users You can publish any data object from GeneSpring to GeNet Uploading Data Objects to GeNet 1 Select the upload method There are two methods for uploading
156. Open Database Connectivity Open Database Connectivity ODBC is an Application Programming Interface API allowing a programmer to abstract a program from a database When writing code to inter act with a database you usually must add code that talks to a particular database using a proprietary language If you want your program to talk to Access Fox and Oracle data bases you must code your program with three different database languages This can be a very difficult or time consuming task This is where ODBC enters the picture When programming to interact with ODBC you only need to speak the ODBC language a combination of ODBC API function calls and the SQL language The ODBC Manager determines how to contend with the type of data base you are targeting Regardless of the database type you are using all of your calls will be to the ODBC API All you must do is install an ODBC driver specific to the type of database you are using Structured Query Language Structured Query Language SQL is a standard language for defining and accessing rela tional databases All of the major database servers used in client server applications work with SQL It is a query language designed to extract organize and update information in relational databases Each database vendor has its own particular dialect These dialects are similar to one another but different enough that programmers must pay close attention to which RDBMS is being used The most im
157. Performing a Simple Search 00 c cece es 4 4 Performing an Advanced Search 2 0 0 0 cece eee eee 4 4 Searching GeNet from GeneSpring 0 000002 eee eee 4 5 Selecting Genes euer oreet doe rU opea BS 4 8 INS PCClOTS 3 2 ssepe dye ORE inte abdo Rs pat OE goes RT TO 4 10 The Gene Inspector su omeri nee cette nee 4 10 The Sample Inspector 00 0 4 13 The Experiment Inspector 2 0 0 0 c eee eee eee ee nee 4 16 The Condition Inspector eee 4 18 The Gene List Inspector aisa ai 0c cc cee ene 4 20 The Classification Inspector 0 0 000 cece ee eee ee 4 22 Display Options passer ae eee ie eee wee eA E SURREY 4 25 Linked Windows 0000 c ccc cece es 4 25 Split Windows io dti a a nae PRU p e I ERE RA Rd 4 4 25 Bookmarks 42 292 IE RS EETRPSNOETEPRUTE iG 4 26 The Vertical AXIS corre erem RR Re BE E HE Rs 4 27 Eitor Barsine etos du rere e Me o dnd bo e en inier EUER 4 28 LeG6nd E ceded aere ie ee edad edere p i e eed 4 28 ColoE vareno pie nae ai REPRE LES RENE S eo AR RA 4 30 Blocks VIEW c Le e ete o D Ae ete e e eee E ne s 4 36 Blocks View Display Options 00 00 e eee eee eee eee 4 36 Graph View Teriga otani Maree oie deg o te apne UE 4 37 Graph View Display Options 0 00 0 e eee eee 4 37 Bar Graph V1ew sis uus se e ies HR eee aS 4 39 Bar Graph View Display Options 0 0000 cece eee 4 39 Physical Position View 1 0 0
158. Spring displays a warning message if missing values have been imported Analyzing Data 6 49 Statistical Analysis ANOVA If there is a larger number of missing values say gt min a b GeneSpring displays a warn ing and exits Non Parametric Two way ANOVA Friedman s Test This tests only for the effect of factor A while controlling for factor B To get a p value for the other factor reverse the factors parameters This does not test for interaction Case No replicates one sample per cell Rank data within each of the 5 blocks separately For each of the a levels of factor A com pute rank sums Aj a Then compute 2 TEER X R 3b a 1 i This statistic has its own distribution however we can approximate by computing 5 1 EC b a 1 x and compare this to F with degrees of freedom a and a 1 5 1 Case Il equal replicates in each cell If there are n replicates within each cell compute a Y R2 3b na 1 1 x i MMi ban na 1 Compare this to the chi square critical value with a 1 degrees of freedom Analyzing Data 6 50 The Filtering Menu The Filtering Menu The Filtering menu allows you to apply a series of restrictions or filters to a gene list These restrictions can apply to an entire experiment or interpretation or to a single condi tion or sample The filters include factors such as quality control control strength expres sion level constraints
159. The Annotations menu in GeneSpring allows you to update your genome make gene lists based on annotations and build gene ontology tables Annotations can also be searched using the Find Gene feature in the Edit menu See Performing a Simple Search on page 4 4 for details Updating your Master Gene Table with GeneSpider After you have loaded a new genome you can make sure it contains the latest information from the genome databases on the World Wide Web by using GeneSpider To use GeneSpider you must have GenBank accession numbers in your master gene table Gen Bank accession numbers are usually added to the GenBank column of the master gene table If you have multiple GenBank accession numbers for a single gene they should be separated by semicolons For details on adding information to your master gene table see The Master Gene Table File on page 2 10 Updating Annotations with GeneSpider 1 Select Annotations gt GeneSpider Pre 4 1 users Select Tools gt GeneSpider Choose one of four options Update annotations from Silicon Genetics Retrieves gene information from the Silicon Genetics Mirror Database The mirror database caches information from GenBank LocusLink and UniGene to ease the load on the NCBI server and allow you to update faster The mirror database is updated about once every two months Update annotations from GenBank Retrieve information on genes from Gen Bank Update annotations from
160. To create a new experiment click Yes You are prompted to enter a name and save the experiment Choose Experiment Name Really Huge Rat They re huge Really really huge e e pil mm posit os iil e o y o D a S E cz 2 D D D Ss z e o Save Cancel Figure 3 7 The Choose Experiment Name window 11 In the Choose Experiment Name window do the following Enter a name for the experiment in the Name field Be sure to choose a descriptive name that you will remember later To save the experiment in an existing folder navigate to that folder in the directory browser in the lower left portion of the screen and leave the Folder field blank To save in a new subfolder navigate to the desired parent folder and enter a name for the new folder in the Folder field e If desired enter notes containing more descriptive information about the experi ment When you are done click Save The New Experiment Checklist appears Working With Experiments 3 7 Importing Experiment Data New Experiment Checklist You are almost finished creating your experiment Before you begin analysis you should set up its normalizations experimental parameters and error model and choose your default experiment interpretation You may reach these windows using the buttons below Alternatively you may find them in the Experiments menu New Experiment Checklist O Define Normalizat
161. VARI in which case you should rename VARI to something else in your appli cation Place the following text in the batch file filename infile sysget infile filename outfile sysget outfile proc import datafile infile DBMS TAB out experiment replace datarow 3 getnames no run f proc fastclus data experiment maxclusters 5 maxiter 50 out clusters keep varl cluster id varl run proc export data clusters outfile outfile DBMS TAB replace run Once you have saved the batch file open File gt New External Program from the Gene Spring menu and do the following Enter SAS FASTCLUS in the Name field Leave the Folder field blank The external program is saved in the External Programs folder by default Select the External Program radio button Enter the following in the Command Line field runsas bat fastclus expt txt clus txt On the Inputs tab click Add Input and select the Experiment Data radio but ton On the Outputs tab click Add Output and select the Experiment Data with Confidence radio button Scripts and External Programs 8 41 External Programs 7 Click Save Your external program should now appear in the GeneSpring Navigator in the External Programs folder The External Program Inspector The External Program Inspector allows you to view details of an existing external pro gram To open the External Program Inspector right click an e
162. a new vector a where the i element in a is the rank of a in a Now make a vector A from a in the same way as A was made from a in the Pearson Correlation Similarly make a vector B from b Result A B A BU Spearman Confidence Compute a value r of the spearman correlation as described above Result 1 probability you would get a value of ror higher by chance Two sided Spearman Confidence Compute a value r of the spearman correlation as described above Result 1 probability you would get a value of r or higher or r or lower by chance Making Lists by Applying Filters To make a list from filter data open any filter window set the desired options and click Save The New Gene List window appears Analyzing Data 6 11 New Gene List 6 127 genes Normalized Data between 0 01 and 20 Working with Gene Lists Yeast cell cycle time series no 90 min expression normalized data from 0 01 to 20 in atleast 8 iof 16 conditions H Gene Lists YLR355W YLR3BBW HO PCA Yeast cell cycle time v R367W Ribosomal protein 22B 824B rp50 Y822 HOI PIR keywords YLR362W ServThr protein kinase MEKK homolog Ha Simplified Gene Ontology YLR363C putative Upft p interacting protein YLR364W YLR371VV GDP GTP Exchange Protein GEP for Rho1p YLR372W involved in fatty acid biosynthesis YLR373C vacuole import and degradation YLR358W YLR369VV mitochondrial
163. ach screen To create a new experiment 1 From the main GeneSpring window select Experiments gt Create New Experiment The Select Samples screen appears For a detailed description of this screen see The Sample Manager on page 3 23 2 Select the samples to include in your experiment To add a sample select it in the Filter Results List and click Add The sample appears in the Samples for New Experiment list To select multiple samples hold down the Ctrl key while clicking the desired samples To add all the samples in the list to your experiment click Add A11 To view detailed information on a sample click Inspect to invoke the Sample Inspector For more information on the Sample Inspector see The Sample Inspector on page 4 13 3 If you need to edit parameters or normalizations for this experiment click Next For detailed information on the Edit Parameters screen see Experiment Parameters on page 3 29 For detailed information on normalizations see Normalizing Data Once you are finished with parameters and normalizations click Finish The Choose Experiment Name screen appears To accept the default parameters and normalizations click Finish The Choose Experiment Name screen appears 4 In the Choose Experiment Name window do the following Enter a name for the experiment in the Name field Be sure to choose a descriptive name that you will remember later To save the experiment in an existing
164. add a file click Add File and select the desired file from the Browse menu that appears You can also drag and drop a file directly from the desktop into the Associated Files list Extract File To save extract a file in the list to another location select the desired file from the list and click Extract File Choose a location from the Browse menu and click Save This does not remove the file from your list It simply places a copy of the file in a new location Delete File To remove an associated file select it in the list and click Delete View File Select a file name in the list and click View File to view the contents of the file in an external program GeneSpring automatically selects the appropriate pro gram to display the file if the file type is known The Condition Inspector A condition is a unique combination of parameters as applied to your sample Each condi tion may be a single sample or a group of replicate samples combined based upon the 4 18 Viewing Data Inspectors parameter values defined for each sample The easiest way to think of this is as the param eters under which the sample s was observed If you have no replicates condition and sample can be considered synonymous l Open the Experiment folder in the navigator by clicking on its icon Click the sign next to the experiment icon 2 3 4 5 Click the sign next to the interpretation icon Right click over a condition
165. ading experiment information 3 Telling GeneSpring how to analyze and display the information by assigning normal izations parameter values and modes of display 4 Annotating updating your genome Loading Your Data Step 1 Load gene information from your arrays optional 1 Start GeneSpring and select File gt New Genome Installation Wizard 2 Type the organism name or the brand name of your array and click Next 3 Enter the information requested on each screen and click Next until you have com pleted the wizard For details see The New Genome Installation Wizard on page 2 2 If you skip this step GeneSpring can load gene information directly from your data files However to retrieve annotations for your genome using the GeneSpider Step 4 you must enter the GenBank accession number of each gene in column 10 of the mas ter gene table Silicon Genetics can provide annotated genomes for many of the most commonly used arrays Call 1 866 SIG SOFT or email support sigenetics com for details Step 2 Load an Experiment 1 Select File gt Import Data 2 Choose a file 3 If GeneSpring recognizes the format of your data file it asks you to name your genome If the data format is unknown you must set up columns using the column edi tor To set up columns click each of the cells in the Function row and choose a data type from the pull down menu When you are done click Next 1 8 Welcome to GeneSpring GeneSprin
166. age eg PCA Yeast cell cycle time series no 80 min IE PCA component 1 Search SGD PCA component 2 Find Similar 1j PCA component 3 Search MIPS IE PCA component 4 a ET Hg PCA component 5 zd m Complex Correlations E Pca component 6 Search Sacch3D 3 PCA component 8 PCA component 8 Search PubMed g PCA component 9 Save Expression Profile 8 all genes Search Swiss Prot X all genomic elements Search PIR Cancel Help Figure 4 6 Gene Inspector window for gene RPS3 Gene Identification Section Information on the selected gene from the master gene table is displayed in the upper left corner of the Gene Inspector window in the Gene Identification section The Data Table The table in the upper right corner is the Data Table It contains the following information Description The condition under which the measurement was taken Normalized The normalized data value For details on normalization see Normal izing Data Control The control strength for the gene For information about control strengths See Per Gene Normalizations on page 5 13 Raw The raw value of the data just as it came off the chip or out of the scanner t test p value This is a measure of the likelihood of this gene s expression value being different from one assuming the data is centered around one The t test p value is applicable only to replicated data e Flags Flags indicate whether or not you
167. al Genetic Elements Links to Web DataBases Miscellaneous Settings Finished 5 Next Links to Web Databases Figure 2 1 The New Genome Installation Wizard Enter a name for your new genome Be sure the name you enter is descriptive Gene Spring creates the genome using the capitalization and spelling you enter at this time Click Next The Genome Data Directory window appears Select a directory in which to save your new genome or create a new one By default GeneSpring displays a new directory name in this field using the same name you entered in the previous screen To accept the default directory click Next e To change the default name enter a new directory location in the text box If you enter a directory name that does not exist GeneSpring creates it for you If you leave the text box blank GeneSpring saves your genome to the default location To select a different directory click Browse and navigate to the desired directory 2 2 Creating Genomes The New Genome Installation Wizard Save in Save in C3 data x amp et E3 C3 Academic Chips Mutant Yeast B begazNumber C Advanced Filters Normalization Scenarios E cdIndex txt cache programs C3 Commercial Chips OR Pipiens Databases Rat of Unusual Size Extraterrestrial Yeast C3scipt lt I E Rename fUTTDEIETTNTENTETTS Save astype All Files SZ Cancel Figure 2 2 The Save In dialog Note Wh
168. alized to raw data For details on the latter options see Using the Genome Browser on page 4 2 For information about error bars see Cross gene Error Models on page 3 44 For informa tion about creating a resizable picture see Saving Pictures and Printing on page 9 4 For information on bookmarks see Bookmarks on page 4 26 Gene Inspection Tools The box in the bottom left corner of the Gene Inspector window contains tools allowing you to search for genes having similar expression profiles to the gene currently displayed Find Similar Allows you to search for genes with similar expression profiles to the gene being inspected Each gene expression profile must have the required minimum correlation to be considered similar The higher the minimum correlation maximum 1 the closer the gene expression profiles must be Enter this number in the Minimum correlation box above the Find Similar button For information on using the Find Sim ilar function see The Find Similar Command on page 6 6 Complex Correlation Allows you to make a gene list comparing the gene being inspected to genes having similar expression profiles in multiple experiments with 4 12 Viewing Data Inspectors more complex parameters than the Find Similar tool allows For information on using the Complex Correlation function see The Find Similar Genes Window on page 6 7 Save As Expression Profile Allows you to save your gene expres
169. ally updates the window regardless of the current display set tings Larger files may take longer to paste depending on your system When the paste is complete a new Choose Experiment Name box appears with the cur rent name of the experiment already in the Name text box When you return to the main window your new experiment is displayed automatically From here you can alter the normalizations with Experiment gt Experiment Normalizations command or the interpretation with the Experiment gt Exper iment Interpretation command Copying an Experiment or a List Out of GeneSpring When you copy an experiment only data for the currently selected gene list is copied To copy data for all the genes in the current experiment right click over the All genes list and select Display List before you begin to copy When you paste the gene list is sorted into the order presented in the Ordered List view 1 Choose an experiment or a gene list from the navigator 2 In the main GeneSpring window select Edit gt Copy gt Copy Experiment Your data is saved to the clipboard 3 Paste your experiment or gene list into Microsoft Excel or a text editor such as Microsoft Notepad or Microsoft Word Working With Experiments 3 20 Default Normalizations Default Normalizations Genespring normalizes your new files based on the technology used to create the original data files For more information on normalizations see Normalizing D
170. alse discovery rate of about 5 of the genes identified Parameter to Test time minutes x Select Groups Manually Test Type Parametric test dont assume variances equal v P value Cutoff 0 05 Multiple Testing Correction Benjamini and Hochberg False Discovery Rate v Post Hoc Tests None e Figure 1 2 1 Way ANOVA options To perform one way ANOVA 1 From the Parameter to Test pull down list select the parameter on which to base your comparison To select a group of parameters see Selecting Groups Manu ally on page 6 35 2 Select the type of test to perform There are four testing options Parametric test assume variances equal Filter based on the results of a Stu dent s two sample t test for two groups or a one way analysis of variance ANOVA for multiple groups Parametric test don t assume variances equal Filter based on the results of an ANOVA or Welch s approximate t test for two groups This is the most appropriate test for standard experiments in which the global error model is not turned on or should not be used in the analysis Parametric test use all available error estimates Filter based on the variances estimated by the cross gene error model If the cross gene error model is not turned on this test is equivalent to the Parametric test Non Parametric test Filter based on the rank of each sample rather than the expression level Non parametric comparisons use the
171. alue the more significant the comparison is The comparisons between the lists and the subtrees are not looking for exact matches but rather statistically significant overlaps which may include subsets and supersets When there is enough space on the screen a label if one exists is displayed along the top hori zontal bar of the subtree Otherwise when there is space a is displayed An amp symbol after a list name indicates the subtree is statistically similar to more than one list all of whom when there is enough room are displayed as labels along the top of the sub tree To take a screen shot that includes the label hover your cursor over the node take the screen shot when the label appears For most Windows applications the cursor is not visi Viewing Data 4 55 Tree View ble just the label For more information about screen shots see Saving Pictures and Printing on page 9 4 Viewing Gene Names in Trees You can magnify the tree until the names are visible along the edge of the genes 1 Place your cursor anywhere over the group of genes to view the gene name When the cursor pauses over a gene a label appears It disappears when the cursor is moved 2 Click once and that gene becomes the selected gene The name of the selected gene appears in the upper right corner of the genome browser Viewing Parameters in Trees For most experiments each measurement was taken under certain conditions
172. amic acid decarboxylase IMAPPED FILE SEQUENCE gt CHR1 Chromosome data CCACACCACACCCACACACCCACAC lt SEQUENCE gt lt GENOME gt 8 44 Scripts and External Programs External Programs Format Input Output Num Description Example Temporary Yes Yes 9 Same as above Same as above Genome but genome is not saved to disk No data for it can be saved and all records of the genome disap pear when the window is closed Scripts and External Programs 8 45 External Programs 8 46 Scripts and External Programs 9 Exporting GeneSpring Data Exporting GeneSpring Data 9 1 Saving Images Saving Images You can save a GeneSpring image and import it into a graphics or other program where you can polish and format it for publication GeneSpring saves images of pathways Venn diagrams the genome browser and the colorbar as pct and png files which can be imported into Microsoft PowerPoint Word Publisher Excel CorelDRAW and Adobe Illustrator among other programs Saving a Genome Browser Image l 2 Display the image to save in the genome browser Select File gt Save Image and choose Browser The Export Options window appears Choose an output format for the image The available formats are PICT or PNG PICT is a vector based graphic format PNG is a bitmap based format There are size limits to both formats PICT files cannot be larger tha
173. arameter values defined for each sample The easiest way to think of this is as the parameters under which the sample s was observed If you have no replicates condi tion and sample can be considered synonymous In Figure 3 19 the conditions are Embryonic Postnatal and Adult Interpretation A description of how GeneSpring displays the data for you to view It would include a definition of applicable parameters and how to treat the normalized numbers This is the way a set of conditions is grouped In Figure 3 19 the interpreta tion is the Default Interpretation Experiment a set of samples generally designed to answer specific types of ques tions The data are usually but not always manipulated in a normalized form In Fig ure 3 19 the experiment is the Rat Study A Note on Multiple Parameters The more experiment parameters you have the more options you have for visually query ing your data If you have samples of tissues with different diseases such as breast cancer kidney cancer liver cancer brain cancer hepatitis A hepatitis B osteoporosis arthritis syphilis and no disease you might want to use several parameters for this experiment Using multiple parameters even if they all refer to the same information allows you to group the data in many different ways which may give you different insights into your data set As another example these could be given parameters of Cancer Pathogen and Genetic Disorder Param
174. argument group Numbers Open Scripts gt Basic Scripts gt Numbers Name Input Knobs Description Compare 1 Number One number Comparison Compare a number to another Number number and output a Boolean Compare 2 Num Two numbers Comparison Compares two numbers and out bers puts a Boolean Number None Number Output the number specified by a knob Number Add Two numbers None Add two numbers together and output the result Number Divide Two numbers None Divide one number by another and output the result Number Multiply Two numbers None Multiply two numbers and output the result Number Subtract Two numbers None Subtract the second number from the first number and output the result Number Log Number none Output the log of the number spec ified in the input Sum of Numbers in A group of num none Outputs the sum of the numbers in Group bers the input group 8 26 Scripts and External Programs Count Groups Open Scripts gt Basic Scripts gt Numbers Script Building Blocks gt Count Groups Name Input Knobs Description Count Conditions in Array of Condi None Determine the number of objects Group tions in the specified array and output the result Count Experiments Array of Experi None Determine the number of objects in Group ments in the specified array and output the result Count Gene Lists in Array of Gen
175. as a classification select the Classification radio button To save the results as a group of gene lists select the Gene Lists radio button 7 12 Clustering and Characterizing Data Clustering Methods 3 From the navigator select a folder in which to save the new classification or gene lists To create a new folder navigate to the desired parent folder and enter a new folder name in the Folder field 4 Enter any additional information in the Notes field if desired 5 Click Save Viewing SOMs SOM results are most easily viewed using the Split Window feature Each graph contains the genes associated with a SOM node Node numbers are shown in the upper right corner of each plot Narmalized Intensity Narmalized Intensity Normalized Intensity 34 log scale 34 log scale 34 log scale 1 ea are 1 GT ar 1 ME TL time minutes time minutes time minutes 0 2040 70 100 140 0 2040 70 100 140 0 2040 70 100 140 1 1 6 genes 6 shown 2 1 3 genes 3 shown 3 1 7 genes 7 shown Normalized Intensity Normalized Intensity Normalized Intensity 34 log scale 34 log scale 34 log scale 1 IRR tte L 1334 SLE LL PL 1 I H 4H H442HH time minutes time minutes time minutes 02040 70 100 140 0 2040 70 100 140 0 2040 70 100 140 1 2 genes shown 2 2 2 genes 2 shown 3 2 1 genes 1 shown Graphed by Yeastcell cycle time series Colored by Calculated Split by 6x5 SOM for Yea
176. ase pair number of the first base in the gene and the base pair number of the first nucleotide in the motif It includes the distance of the promoter This means the dis tance number is the difference between the promoter sequence and the ORF Sequence Contains the sequence being examined displayed in bold On the left side of it are the ten bases proceeding this instance of the motif and on the right side are the 10 bases that follow it in the nucleotide sequence The Homology Tool The Homology Tool automates the process of building homology tables for certain organ isms Currently this is a limited list of organisms Using this tool homologies can be made between any pair of organisms that are included in both HomoloGene and UniGene Within genome homologies are based solely on UniGene Cluster ID or LocusLink Locus ID The Master Table of Genes for each genome must be in GeneSpring 5 0 format If the selected genomes are in an earlier format you must convert them before running the Homology Tool For more information on the GeneSpring 5 0 Master Table of Genes for mat see The Master Gene Table File on page 2 10 Homologies can be created for the following organisms Cow Bos Taurus Nematode Caenorhabditis elegans Sea squirt Ciona intestinalis Zebrafish Danio rerio Analyzing Data 6 24 Working with Gene Lists Fruit fly Drosophila melanogaster Human Homo sapiens Mouse Mus Musculus e Rat Ra
177. ases where there are performance issues This element is contained by PhysicalData base Contents plain text Attributes n a Usage lt Prefetch gt 20 lt Prefetch gt Notes Optional rarely used lt TechnologyType gt Sample special field technology to be set for each sample uploaded This identifies the chip or technology used for the sample Contents n a Attributes name Usage lt TechnologyType name A ffymetrix gt Notes Optional lt Header gt This element specifies header fields to be set for each sample uploaded from the current database Contents Author Research Group gt lt Organization gt Attributes n a Usage lt Header gt lt Header gt Notes Required for lt PhysicalDatabase gt Author Specifies the sample author to be designated in the header field for each sample being uploaded Contents plain text Attributes n a Usage Author Juanita Nguyen lt Author gt Notes Required for Header A 12 Installing from a Database Connecting your Database to GeneSpring lt ResearchGroup gt Specifies the research group to be designated in the header field for each sample being uploaded Contents plain text Attributes n a Usage ResearchGroup Discovery Central lt Research Group Notes Required for Header Organization Specifies the organization to be designated in the header field for each sample being uploaded Contents plain text
178. ata One Color Experiments GeneSpring counts only samples of the same data type which have both the data transform and the normalize to median steps applied One Color normalizations automatically dis play all measurements Data transformation Set measurements less than 0 01 to 0 01 Per Chip Normalize to 50th percentile e Options Use all flags never apply background correction Per Gene Normalize to median cutoff 10 in raw data if 3 samples Two Color Experiments Two color experiments are automatically normalized to a signal ratio Two color normal izations automatically display all measurements Per Spot tIntensity dependent Lowess if more than 100 genes per region divide by control channel if fewer than 100 genes per region Cutoff 10 in raw data 20 of data used for smoothing Per Chip Intensity dependent Lowess if more than 1000 genes per chip divide by control channel if fewer than 1000 genes per chip e Options Use background correction if necessary anything but absent Cutoff 10 in raw data Pre Normalized Data This applies only to samples uploaded to GeNet before version 3 0 created from experi ments using the Merge Split functionality in an earlier version of GeneSpring or samples imported using the Paste Experiment function Start with pre normalized data e Per spot Reserve control channel Replicates If you have multiple measurements for the same gene in the same
179. ata Sample Attributes screen appears On this screen you must enter required attribute information before proceeding You can also add recommended and optional attribute information You can also add new attributes at this time For more information on sample attributes see Sample Attributes on page 3 35 Import Data Sample Attributes Please select values for sample attributes New Attribute Attribute Name Diseased Normal Edit Attribute Value Attribute Units no 1 Mergen Rat 1 ORFs txt ECS Replace Text Previous Next Cancel Help Figure 3 5 The Sample Attributes window For attributes with standard values a pull down menu appears Choose Other to turn the cell into a text field you can type in Click Next when you are done 10 At this point your new samples have been saved You can either create an experiment using the new samples or stop here 3 6 Working With Experiments Importing Experiment Data Import Data Create Experiment 1 new samples have been created Would you also like to create an experiment from these _No_ Figure 3 6 The Create Experiment dialog samples To stop here click No The imported data are saved but a new experiment file is not created The data are saved as samples The Sample Inspector displays each of the new samples You can create experiments from these data later by selecting Experi ments gt Create New Experiment
180. atabase Adding an Experiment from a Database Adding an Experiment from a Database This section describes the process of making your database visible to GeneSpring The database administrator should have done this already If they have you can skip down to Connecting your Database to GeneSpring on page A 6 Creating a New ODBC Source l 2 UJ mido cg pA m Select Start Control Panel Open Administrative Tools Open Data Sources ODBC A new window The OCBC Data Source Administrator appears Go to the system DSN Click Add for a new Create New Data Source window Select the correct type of database from the scrolling list A new panel appears Give the experiment a name Remember that experiment names in GeneSpring are case sensitive Click Select to browse for the correct database Usually you will connect to a new computer server to access the database The new entry appears in the list of databases Testing Your ODBC Connection l 2 3 Open Excel Select Data Get External Data Select New Database Query Look for your database in the presented list Installing from a Database A 5 Connecting your Database to GeneSpring Connecting your Database to GeneSpring The following section describes the use of the database configuration file You must cus tomize this file for your system before running GeneSpring You can have any number of different configuration files The purpose of this
181. atabases and GeneSpring 000 cece eee eens A 2 Databases ix tete eL Bea Be ee RO Og tes A 2 Open Database Connectivity 0 0 eee eee A 2 Structured Query Language ees A 2 SQL Call Level Interfaces A 3 The Genetic Analysis Technology Consortium A 3 Databases and GeneSpring 00 c cece eens A 3 Adding an Experiment from a Database 0 000 000 A 5 Connecting your Database to GeneSpring 0 2 00005 A 6 Configuration File Reference 0 000 ce cece eee A 6 Tag Definitions once d Se RE MU BO e ow OE te A 9 Entering your Database into GeneSpring 0 0005 A 23 Prepared Databases 0 0 0 eese A 23 More Complicated Databases 0 0000 ccc eee A 23 Equations for Correlations and other Similarity Measures Measures of Similarity 0 0 cece eee B 2 Common Correlations 2 0 0 0 0 ce eee nee B 3 Standard Correlation sea a a A E E e nee eee B 3 Pearson Correlation 0 0 00 0 cece a B 3 Spearman Correlation 2 0 0 e ccc eee ec nes B 4 Spearman Confidence 00 c ccc ees B 4 Two Sided Spearman Confidence 0 0 0 020005 B 5 Distafice sit bsec b Meeks RON wie hea ie eee ees on es B 5 Smooth Correlation 0 0 00 cece eee B 6 Change Correlation 0 0 00 c ccc cece EE ER nes B 6 Upregulated Correlation 0 0 00 eee B 7 Number of Samples Required to
182. ated by GeneSpring during the genome creation process This information may be helpful if you need to create or edit these files manually The Master Gene Table File 1 To set up gene annotations create a table with the following structure using the tab delimited text format Systematic Name GenBank Accession Synonym Sequence PubMed ID Custom 1 Custom 2 Custom 3 Type o Keywords _ o N w A a o N A w m Common Name o e Map o a EC Number m Description i Product Phenotype I Function J L M N o P Oo Note There are five additional columns which are not displayed above due to space con straints These are DBid 18 R GO Biological Process 19 S GO Molecular Function 20 T GO Cellular Component 21 U RefSeq 22 V For example if you had Gene Identifiers as the genes would be identified in a raw data file and GenBank Accession number for all the probes on a chip the table viewed in Excel might look something like this Aili L14754 Adj X79882 A1k D78579 All M31630 A1m J04111 etc etc Opened as a tab delimited text file the table might look like this 2 10 Creating Genomes Data Format Genes_on_my_chip txt Notepad AER Fi
183. ated more closely than that A higher number tends to lump more genes into a group making the groups less specific 7 4 Clustering and Characterizing Data Clustering Methods Clustering Methods Gene Tree The classification of organisms into phylogenetic trees is a central concept to biology Organisms sharing properties tend to be clustered together and the location of a branch containing both organisms can be considered a measure of how similar the organisms are You can classify genes in a similar manner clustering those whose expression patterns are similar into nearby places in a tree Such mock phylogenetic trees are often referred to as gene trees GeneSpring can both create and display such trees GeneSpring can also create trees of experiments displaying the genes along one axis and the samples along the other axis This is useful for many applications For example you can determine if any environmental stressors cause similar effects on the expression levels as mutant organisms do If you have already created or downloaded trees open the Gene Trees folder in the naviga tor and select any tree for viewing Gene Tree Clustering Options The following options are available for gene tree clustering Similarity Measure available options are Standard Correlation Smooth Correlation Change Correlation Upregulated Correlation Pearson Correlation Spearman Correlation Spearman Confidence Two sided S
184. ation deleteAfterwards mimeType lt GetRawData gt lt DatabaseQuery gt n a lt Format gt lt GenomeMappingSpec gt n a targetName sourceName baseDirectory lt DatabaseQuery gt SQL command useGenome Name db lt DataDirectory gt plain text n a lt FileNameMask gt plain text n a lt IDFromFileName gt lt RegexpMatch gt n a lt DatabaseQuery gt lt RegexpMatch gt plain text n a lt JavaQuery gt n a class extraArgs A 8 Installing from a Database Connecting your Database to GeneSpring Element Contents Attributes lt Format gt lt GeneColumn gt type lt Headlines gt lt SignalColumn gt lt NormalizedColumn gt lt ReferenceColumn gt lt SignalBackgroundColumn gt lt ReferenceBackgroundColumn gt lt ExperimentWorkedColumn gt lt ExperimentWorkedDesignation gt lt ExperimentAbsentDesignation gt lt ExperimentMarginalDesignation gt lt RegionColumn gt lt TreatNoSignalAsInvalid gt lt LowerBoundOnSignalColumn gt lt UpperBoundOnSignalColumn gt lt StandardDeviationSignalColumn gt lt ColumnHeaderLine gt lt GeneColumn gt plain text n a Headlines plain text n a lt SignalColumn gt plain text n a lt NormalizedColumn gt plain text n a lt ReferenceColumn gt plain text n a lt SignalBackgroundColumn gt plain text n a lt ReferenceBackgroundColumn gt plain text n a lt ExperimentWorkedColumn gt plain text
185. ation molar molal ppm Number Optional maim cc lKg 96 wlv S6 ww vv Data Optional processing normalization Developmental Stage pre embryonic Optional embryonic postnatal 4 OK Cancel Help Figure 3 22 The Standard Attributes Editor To add a standard attribute 1 Select Edit gt Standard Attributes The Standard Attributes window appears 2 Click Add The Add Attribute window appears Working With Experiments 3 37 8 Sample Attributes Add Attribute Attribute Description Attribute Name Planet of Origin Units Suggested Attribute Units Remove Row optional Suggested Attribute Values Remove Row optional Data Type This Attribute Is Recommended X ok Cance Help Figure 3 23 The Add Attribute window Enter a name for the new attribute To add a suggested unit of measurement for the attribute such as minutes or ppm click in a row of the Suggested Units table and enter a unit type You can add as many suggested units as you like These units will be selectable by users when they assign this attribute to a sample To add a suggested value for the attribute such as Control Group or Martian click in a row of the Suggested Values table and enter a value You can add as many sug gested values as you like These values will be selectable by users when they assign the attribute to a sample Select the data type Te
186. axis display format can be altered by modifying the experiment interpretation or by using the display options window If the vertical axis format is modified in the dis play options window this new format overrides the display settings within each interpreta tion To modify the vertical axis format using the display options window 1 Select View gt Display Options 2 Click the Vertical Axis tab 3 Uncheck the Lock Vertical Axis Format to the Interpretation checkbox 4 Select a value to graph Normalized Control Raw Average of Raw and Control or Max of Raw and Control Viewing Data 4 27 Display Options 5 Select a graph mode Linear Logarithmic or Fold Change 6 To adjust the vertical axis so that all measurements are visible check the Scale Verti cal Axis to Show all Values box The upper and lower bounds are adjusted automati cally Alternatively you can manually set the upper and lower bounds to values of your choosing 7 Click Apply In addition to the vertical axis format you can also modify the tick spacing on the vertical axis Unlike the vertical axis format the tick spacing is not specified in the interpretation To manually adjust the tick spacing 1 Select View gt Display Options Click the Vertical Axis tab 2 3 Uncheck the Automatic Tick Spacing on Vertical Axis box 4 Enter the distance between major ticks in the Major Tick Interval field 5 Enter the number of divisions be
187. ay appear across the bottom of the screen Warning messages are always preceded by the word Warning and are displayed in bright red text Error messages appear in dark red text You can save a script when a warning or error message is active but it may not perform as expected You cannot run a script with an active error message Script Help You can get help writing scripts as well as all other Silicon Genetics products by contact ing Silicon Genetics Technical Support at 650 367 9600 or support sigenetics com 8 18 Scripts and External Programs Using the ScriptEditor To view current information on the Silicon Genetics web site select Help gt Fre quently Asked Questions You are directed to the following URL http www sigenetics com GeneSpring fag index html Scripts and External Programs 8 19 Script Building Blocks Script Building Blocks The ScriptEditor comes with a predefined set of building blocks you can join together in various ways to build scripts There are several categories of building blocks Boolean Boolean Select Gene List Manipulations GeNet Downloading Default Directory GeNet Downloading Specified Directory Look Up Merge Split Groups Make Groups Select Groups Numbers Count Groups Clustering Correlations Filtering Regulatory Sequences Statistical Analysis ANOVA Standardize Inputs You can combine various building blocks to create a script For very lo
188. bases at NCBI which both use the same record format A sample record is shown below with information retrieved by the GeneSpider highlighted in blue and the fields it fills in GeneSpring s master table of genes indicated on the following line The record is organized with keywords and subkeywords The features table is organized by feature keys A complete description of the format for the latest release of GenBank is available at lt ftp ftp ncbi nih gov genbank gbrel txt gt The GeneSpider fills in annotations from GenBank as summarized in the following table Feature keys are found in the margin of the features table the section between the FEA TURES line and the BASE COUNT line Qualifiers indicate information about a feature they begin with a slash followed by the qualifier name then an equals sign e g gene Annotation Feature Key Qualifier Common CDS or gene gene Map source map or chromosome EC number CDS or gene EC_number Description CDS or gene note also from DEFINITION line Product CDS or gene product Phenotype CDS or gene phenotype Function CDS or gene function DBld CDS or gene db_xref keyword Analyzing Data 6 30 Annotation Tools Keywords KEYWORDS Description DEFINITION also from CDS gene note Sequence following the ORIGIN line LocusLink Update Annotations from LocusLink reads the html source from queries to the LocusLink database a
189. bout microar ray designs manufacturing information and experiment setup and execution information as well as gene expression data and analysis results Certain publications and laboratories require results to be published to Array Express or Gene Expression Omnibus in MAGE ML format Array Express is a new public repository for microarray based gene expression data Incyte provided funding for its creation and it is now funded by EMBL and EBI Array Express accepts only data submitted via MIAMExpress a web based submission inter face or via FTP in MAGE ML format MAGE ML is extremely broad in its definitions As a result EBl Array Express has defined its own version of MAGE ML Array Express will only accept submissions that are both MIAME compliant and conform to their MAGE ML standard GeneSpring currently supports export only of MAGE ML data It does not fully support export of data in a format that is applicable for ArrayExpress submission To export an experiment in MAGE ML format 1 Right click an experiment in the GeneSpring navigator and select Export as MAGE ML The MAGE ML Export window appears 9 8 Exporting GeneSpring Data Exporting Gene Lists Export Experiment Yeast cell cycle time series no 90 min in MAGE ML format Experiment Name Yeast cell cycle time series no 90 min Experiment Design Type treated vs untreated time course m dose response m knock out response a Standard Custo
190. bute Replace Text To replace many entries at once select the entries to change and click Replace Text Enter the appropriate text in the dialog box that appears To replace all instances of an entry choose Replace Text and then uncheck the Replace in selected cells only box before clicking OK Fill Down To replace entries using the top selected cell click on the cell you want to use as the replacement and then holding down the Shift key click on the cells underneath whose values you would like replaced with the original cell Then click Fill Down Working With Experiments 3 36 Sample Attributes Fill Sequence Down This allows you to fill down as described above but automatically continue a simple numeric or alphabetic sequence Editing Standard Attributes GeneSpring comes with a set of standard attributes These attributes are available for use in any experiment in GeneSpring You can add as many additional attributes as you like or edit existing attributes using the standard attribute editor Standard Attributes BHAR List of Standard Attributes Name units pata Type Reauirea Age years months weeks Number Optional days Array Design Affymetrix U95Aver2 Text Optional Custom 8K cDNA array Spotted 20K Membrane Filter TaqMan RT PCR Author Optional Common Reference none Yes total genomic Optional Yes No ID DNA Yes non treated time 0 No non treated tissue Yes R5624 Stratagene Concentr
191. by using the keyboard commands for Windows this is Ctrl C and Ctrl V Cutting and pasting is advised to ensure that URLs are properly entered GeneSpring attempts to locate each URL you insert before it allows you to proceed to the next panel This may be a problem if you are not connected to the internet when you are creating this genome In this case you will have to skip this screen and add the web links to the genomedef file later To add hyperlinks from Gene Spring see Step on page 4 GeneSpring cannot automatically locate the default web browser on NT or Macin tosh systems You must set the path manually To set the path to the browser SelectEdit gt Preferences Click the Browser tab Inthe Browser path box either type the complete file name and pathway of the exe file for your default browser or click the Browse button to locate the proper executable which is most likely located in the system directory In a Windows NT environment your path may look something like this C Program Files Plus Microsoft Internet IEX PLORE EXE Click OK to close the Preferences window When you are done with this screen click Next to proceed to the Miscellaneous Set tings screen 10 From this screen you can force all of the systematic gene names to appear in upper or lower case letters by selecting the appropriate checkbox You do not have to select either of these options Click Next The Finished screen appears 11 Click t
192. cally create trust values In two color experiments the trust value is usually the control channel typically Cy5 unless you do a per chip normalization in which case it is the control channel x the median of the control channel x the median of the signal channel 4 30 Viewing Data Display Options For Affymetrix and other one color experiments the trust value is constructed based on the normalizations you have chosen If you accept the default normalizations for Affymetrix data use distribution of all genes using the 50th percentile and normalize to the median for each gene then trust is the median value of the chip x the median value of the gene If you choose to use distribution of all genes using the 50th percentile and normalize to sample s trust is calculated as follows the median value of the chip x the average of the gene s measurement in control samples Changing Colorbar Settings To set the trust interpretation l 2 Right click the colorbar Click Set Coloring The Display Options screen appears with the Coloring tab pre selected Click Set Colorbar Range This button is active only when coloring by expres sion The Colorbar Range dialog appears Colorbar Range DER Setthe Range of Expression Labels Set the Trust Cutoffs OK Cancel Help Defaults Figure 4 17 The Colorbar Range dialog To enable custom settings uncheck the Use Experiment Default Val
193. ch is Control e JD interpreted as Gene Name F635 Median or F532 Median interpreted as Signal B635 Median or B532 Median interpreted as Signal Background F635 Median or F532 Median interpreted as Control Channel B635 Median or B532 Median interpreted as Control Channel Background e Name interpreted as Description e Flags interpreted as Flags BioDiscovery Imagene 4 e Gene ID interpreted as Gene Name Working With Experiments 3 13 Using the Column Editor Signal Median interpreted as Signal Background Median interpreted as Signal Background Signal Median interpreted as Control Channel Background Median interpreted as Control Channel Background Flag interpreted as Flags Incyte GEM Tools 2 4 CloneID interpreted as Gene Name P2 BalancedSignal or P2 Balanced interpreted as Signal P1Signal or P1 interpreted as Control Channel Gene Name interpreted as Description AccessionNum or Accession interpreted as GenBankID Internet Download CloneID interpreted as Gene Name Varies format of PS Cy5 determined in header interpreted as Signal Varies format of PS Cy3 determined in header interpreted as Control Channel Gene name interpreted as Description PS Absent Present where is the sample name interpreted as Flags P A interpreted as Flag Designators GEM ID interpreted as Custom1 Gene ID interpreted as Custom2 Packard Biochip ScanArray QuantArra
194. check the error messages at the bottom to make sure everything is connected properly 8 Name and save your script See Saving Scripts on page 8 18 If you would like to create a new script immediately select File gt New You can make several small scripts and join them together Saving a Building Block as a Script You can save a block built from multiple other blocks as a script To do this right click on the block and select Save Block as Script This is useful for updating obsolete script building blocks or if you want to use a block from a script someone else created Arranging Inputs and Outputs in the Browser When you right click over a socket icon in the browser a pop up menu appears Which options appear depends on the status of the selected socket Move to Start This command moves the socket all the way to the left of the inputs or outputs field It allows you to untangle the web of connecting lines Move to End This command moves the socket all the way to the right of the inputs or outputs field It allows you to untangle the web of connecting lines Delete You will not get a warning message before the building block is removed from your potential script A number appears in parentheses after each socket in the input field The numbers repre sent the order in which the input will be requested when the script is run Dynamically Naming Script Outputs You can define output names dynamica
195. cify whether to run this process on your local machine or a GeNet Remote Server 11 Click Start Interpreting the Results of a Prediction The Prediction Results window appears after you have made a prediction or validated a training set For convenience not all of the prediction statistics are visible until you click the Show Details button at the bottom of the window True Value the true value of the class of each sample as calculated when the param eter for the test set is already known Compare this with the value in the Prediction col umn to validate your training set Prediction the predicted class P value ratio the P value ratio or the probability that the prediction was made by chance for the two classes If you have more than two classes the ratio is the lowest P value divided by the next lowest P value Class counts the individual class counts for each sample P value probability that individual class counts were found by chance The Class Predictor is designed for experiments with at least 20 or so samples in each class It is possible to use the Predictor when you have very small sample sizes if you dis able the P value cutoff function For sample sizes of less than 5 specify 1 or 2 number of neighbors and specify 1 in the P value cutoff field Clustering and Characterizing Data 7 25 Find Similar Samples Find Similar Samples Find Similar Samples H Gene Lists Ho EC Choose Gene List gt gt
196. cluster In GeneSpring the cutoff is specified based on a correlation function so the cutoff is the minimum allowed value In QT clustering the diameter of a cluster refers to the largest distance between any two genes in the same cluster QT clustering builds a cluster by starting with a single gene The diameter at that point is 0 It then adds the gene that is closest to the starting gene The diameter of the cluster is now equal to the distance between the two genes It continues adding genes one at a time always choosing the gene that will result in the smallest cluster diameter Eventually it reaches a point where no genes can be added without the diameter growing beyond the allowed cutoff The cluster is then complete The cluster obtained depends on which gene is chosen to start from Therefore it indepen dently builds clusters starting from each gene in the user selected gene list The cluster with the most genes is kept and is part of the final classification All others are discarded There is now a single cluster The genes in this cluster are removed from the list and the process is begun again A new cluster is built from every gene in the reduced gene list the largest one is kept This process is repeated until the number of genes in the largest cluster is smaller than a user defined cutoff QT Clustering Options The following options are available for QT Clustering Minimum Cluster Size The smallest allowable size
197. cs on the slider are not spaced linearly or logarithmically In these fil ters the numbers are spaced so that an equal number of genes fall between each tic This occurs since using a linear or logarithmic distribution would cause 99 of the genes to fall within three pixels of each other making the slider impossible to use Note When you enter a number in the Minimum Maximum box the slider is moved to that exact number However when you move the slider the number shown in the Minimum Maximum box is rounded to three digits after the decimal At the ends of the slider this rounding may sometimes exclude the very biggest or smallest value Data Types for Restrictions You can change the type of data on which to base the restriction by choosing from a pull down list in the applicable window Depending on which feature you are currently using you may have access to only some of the options in the following list Normalized Data Gene expression values after all normalizations have been applied These are the default values displayed in various views and are shown in the Normalized column in the Gene Inspector See The Gene Inspector on page 4 10 for details Raw Data Experimental data prior to application of any normalizations This value is used as the numerator to calculate normalized values Note If your computer s default language is not English make sure a consistent conven tion for decimal markers is followed Co
198. ct it in the Sample Attributes list and click Remove To edit an attribute select it in the Sample Attributes list and click Edit Attribute The Sample Attribute screen appears Proceed as you would if you were adding a new attribute The Similar Samples Tab This tab lists the correlation between the current sample and the samples in the currently selected experiment This list appears only when an experiment containing the sample is selected You can click any of the samples in this list and view them in another Sample Inspector window by clicking View Sample Viewing Data 4 15 Inspectors The Associated Files Tab This tab lists any files that may be associated with the sample including data files array images sample images etc If there is a sample image included among the associated files it is displayed in the Sample Image panel to the right of the list You can re order the files in this list by clicking the property to sort by in the list headers For example to sort by file type rather than file name click the File Type column header From this screen you can do the following Add File To add a file click Add File and select the desired file from the Browse menu that appears You can also drag and drop a file directly from the desktop into the Associated Files list Extract File To save extract a file in the list to another location select the desired file from the list and click Extract File Choose a loca
199. ction AB df a 1 b 1 Error df N ab Divide each SS by appropriate df to obtain mean SS F ratios and P values are then com puted as before Case Ill No Replicates one per cell We can still perform 2 way ANOVA in this case but it is not possible to test for an interac tion effect The calculations are essentially the same as before The total number of replicates is N ab Analyzing Data 6 48 Statistical Analysis ANOVA zs Tn SS total Y C i j xr SS A MERC degrees of freedom a 1 SS B C degrees of freedom b 1 a SS error SS total SS A SS B degrees of freedom a 1 b 1 Mean sums of squares sum of squares degrees of freedom F ratios F MSS A MSS error pa MSS B MSS error Compute p values as before Case IV Disproportional Replication If a single cell is one value short of the number required for proportional replication esti mate the missing value using the following aA bB Y Y Ng x i j k jk N 1 a b where 4 B are as before and N is the total number of data including the missing value If several cells are missing values or more than one value is missing apply this formula iteratively if necessary After missing values have been estimated perform ANOVA calculations as above but do not increase degrees of freedom That is error df should still be based on the original num ber of data points Gene
200. ctor window This is easier after zooming in on the gene A shortcut to the Gene Inspector is Ctrl I or 86 I for Mac users Undo You can undo your last action by selecting Edit gt UndoorCtrl 2 d Z for Mac users Your First Gene Lists To make lists from appropriate keywords 1 Select Annotations gt Make Gene Lists from Properties 2 Choose the property you want to use for generating lists 3 Click OK To make a list based on biological function 1 Select Annotations gt Build Simplified Ontology 2 Name your new list 3 Click OK To make lists from a group of selected genes 1 Right click over a highlighted group of genes 2 SelectMake List from Selected Genes from the pop up menu Your new lists appear in the Gene Lists folder 1 12 Welcome to GeneSpring Basic Actions Tips for Macintosh Users Except where otherwise noted instructions in this manual describe GeneSpring usage on a PC If you are a Macintosh user you may find the following keystroke and mouse conver sion information helpful Right Click Hold Control and click This most often activates a pop up menu Ctrl 36 Substitute the 3 key wherever the manual mentions Ctrl For example if the manual says press Ctrl I to reach the Gene Inspector substitute the 3 Apple key for Ctrl Drawing genes on a pathway Hold down the Option key and drag your cursor diag onally to draw a gene on a pathway See Pathways
201. d Accepted values are true and false This attribute applies only if the loca tion attribute is set to file This attribute is optional If not specified its value defaults to false The mimeType attribute specifies the MIME type of the file or files being retrieved Any valid MIME type is an acceptable value This attribute is optional Contents lt DatabaseQuery gt lt JavaQuery gt Attributes type location deleteA fterwards mimeType Usage lt GetFile type Sample Image location database deleteA fterwards true mimeType image gif gt lt GetFile gt Notes Optional lt GetRawData gt This option specifies how to retrieve the actual sample data This may come from either a database or a raw file or files The raw file itself may have been a file downloaded from a database or extracted from a Java class If the data is located in a database use lt DatabaseQuery gt to retrieve it If it is in a file or directory of files it is interpreted as a tab delimited file and you must specify the file for mat using the Format tag Contents lt DatabaseQuery gt Format Attributes n a Usage lt GetRawData gt lt GetRawData gt Notes Required lt DatabaseQuery gt This element allows you to enter a SQL query that produces a list of sample identifiers attributes or other data based on the provided genome name If useGenomeName is Installing from a Databa
202. d other Similarity Measures on page B 1 Merge similar branches Merge branches with similar results For information ion the Separation Ratio and Minimum Distance settings see Advanced Tree Options on page 7 8 Saving Condition Tree Results When the operation is complete the Name New Condition Tree window appears Name New Condition Tree 16 conditions east cell cycle time series no 90 min Default Interpretation Experiment tree of like YMR199W CLN1 0 95 based on interpretation s interpretation Yeast cell cycle time series no 90 min Default Interpretation mode Log of ratio weight 1 0 Similarity Measure was Standard Correlation H Condition Trees S11 ime 60 mi ime 70 mi me 140 m ime 100 m me 110m ime 30 mi me 120 m ime 40 mi me 130 m ime 50 mi dd d fer E28 oc o EEE ti t t t Lu t Figure 7 4 The Name New Condition Tree window To save the new condition tree 1 Enter a name in the Name field at the top of the screen Names may not exceed 80 characters 2 From the navigator select a folder in which to save the new condition tree To create a new folder navigate to the desired parent folder and enter a new folder name in the Folder field Clustering and Characterizing Data 7 7 Clustering Methods 3 Enter any additional information in the Notes field if desired 4 Click Save Advanced Tree Options The separation rat
203. d percentile is taken Per Gene Normalizations Divide by Specific Samples In this normalization each gene is divided by the intensity of that gene in a specific con trol sample or by the average intensity in several control samples The formula for this is signal strength of gene A in sample X signal strength of gene A in the control sample s Or signal strength of gene A in sample X average signal strength of gene A in several control samples Normalizing Data 5 13 Normalization Types 5 Per Gene Normalize to specific samples DER This will divide each gene in the selected sample s bythe Median v ofthat gene s measurements in the control sample s Select sample s to normalize Selectthe control sample s 1 yeasttimeseries txt column 2 time 0 1 yeasttimeseries txt column 2 time 0 2 yeasttimeseries txt column 3 time 1 2 yeasttimeseries txt column 3 time 1 3 yeasttimeseries txt column 4 time 2t Y 3 yeasttimeseries txt column 4 time 2 4 gt lt All gt Jii Check All Clear All Check All Clear All Samples to Normalize Control Samples Add Row unspecified unspecified Delete Row Restrict Measurements used in the Calculation Require the control values to be flagged as v I Use only measurements with F na 2 Cancel Help Figure 5 3 per gene Normalize to specific samples window Specify whether to divide by the mean or median by selecting from the p
204. d Characterizing Data Principal Components Analysis Morrison D F Multivariate Statistical Methods Second Edition McGraw Hill Book Co New York 1976 Pearson K On Lines and Planes of Closest Fit to Systems of Points in Space Philosophi cal Magazine 6 2 559 572 1901 Rao C R The Use and Interpretation of Principal Component Analysis in Applied Research Sankhya A 26 329 358 1964 Raychaudhuri S Stuart J M and Altman R B Principal components analysis to sum marize microarray experiments application to sporulation time series Pacific Sympo sium on Biocomputing 2000 Clustering and Characterizing Data 7 23 The Class Predictor The Class Predictor The Class Predictor is designed to predict the value or class of an individual parameter in an uncharacterized sample or set of samples It does this in two steps First the Class Predictor algorithm examines all genes in the training set individually and ranks them on their power to discriminate each class from all the others Next it uses the most predictive genes to classify the test set i e the set where the parameter value of interest is unknown For example you could attempt to diagnose the leukemia type of a leukemia patient with the Class Predictor by using expression data from patients whose leukemia type was known You can also use the Class Predictor simply to find genes whose behavior is related to a given parameter by examining th
205. d by the next lowest P value To Use the Class Predictor 1 Select Tools gt Predict Parameter Values The Predict Parameter Values window appears 2 Open the Experiments folder in the navigator and click your training set the set of samples for which the parameters are already known Click Training Set 3 Click your test set the set where the parameter value of interest is unknown and click Test Set 4 Open the Gene Lists folder in the navigator and click a gene list to be used in the selec tion process Click Select Genes From 5 Select a parameter in the Parameter to predict box 6 Specify a Number of predictor genes to be used in the prediction 7 Specify a Number of neighbors Generally this number should be no more than half the size of a single class and no less than 10 7 24 Clustering and Characterizing Data The Class Predictor 8 Specifya Decision cutoff for P value ratio The P value cutoff is a threshold such that if there is not sufficient evidence in favor of a particular class no prediction is made The P value cutoff is a ratio of the probability that the prediction was made by chance for the two classes If you have more than two classes the ratio is the lowest P value divided by the next lowest P value Select Predict Test Set to make a prediction or Crossvalidate Train ing Set to evaluate how well the prediction rule can be used to predict the parameter values of the training set 10 Spe
206. d division F gt La mitosis an H Array Layouts ow All Genes Zoom Out i a Figure 1 4 The GeneSpring Navigator Expression 2 2 D D 8 0 lt ag 3 m N E S5 Zz 0 50 130 Welcome to GeneSpring 1 13 Basic Actions The Gene Lists Folder During analysis you will create and work with interesting collections of genes known as gene lists These gene lists are stored in the Gene Lists folder By default GeneSpring makes and displays an all genes list containing all genes in the genome The Experiments Folder The Experiments folder contains experiment information Experiments are divided into interpretations Experiment Interpretations tell GeneSpring how to treat and display your experiment variables called experiment parameters Conditions are groupings of one or more samples Each sample may be a condition as in the All Samples interpretation or a condition may include multiple samples For exam ple because the experiment above is organized according to the parameter values Embry onic Postnatal and Adult these can be called the conditions of the experiment Within these conditions the parameter day is being treated as a replicate and has been averaged for each condition Embryonic Postnatal and Adult across all samples Hence a condition can include data from more
207. der in the GeneSpring data direc tory The genome reappears the next time you start GeneSpring To permanently remove a genome delete its folder from the GeneSpring directory 2 14 Creating Genomes 3 Working With Experiments Working With Experiments 3 1 Importing Experiment Data Importing Experiment Data GeneSpring can load data from nearly any expression analysis technology provided the data are formatted as tab delimited text The following section describes methods of load ing data that is automatically recognized by GeneSpring as well as for loading data from a custom source GeneSpring automatically recognizes the formats of the following products Clontech AtlasImage 2 0 e Affymetrix Metrixs e Affymetrix Pivot Affmetrix MAS 5 0 Axon GenePix Pro 2 Axon GenePix Pro 3 BioDiscovery Imagene 4 Incyte Internet Incyte GEM Tools 2 4 e Packard Biochip ScanArray QuantArray Agilent Feature Extraction Amersham CodeLink dChip If GeneSpring is unfamiliar with your file format you can define a custom format to spec ify the type of data in each column These specifications can be added to the list of known file types so that you can load subsequent experiments in batches Make sure you use the raw tab delimited files just as they come out of the scanner Gene Spring uses this information in the column headers If you have cut out header informa tion use your original tab delimited data file
208. direction To pan do one of the following Usethe arrow keys to move in the desired direction e Use the Page Up or Page Down keys to travel one screen s distance up or down Modifying Display Options Each of the displays in the View menu provide multiple options for modifying the way your data are represented To see what display options are available in the current view select View Display Options For more information on display options see Display Options on page 4 25 4 2 Viewing Data Using the Genome Browser Displaying a Gene List Displaying a Gene List To display a gene list select a list with an ordinary mouse click Alternately 1 Right click on the gene list to view in the Gene List folder in the navigator A submenu appears 2 Select Display List Displaying a Gene List as a Secondary List 1 Display a gene list as outlined above then right click above the gene list to view as your secondary gene list A submenu appears 2 Select Display As a Second List To remove the secondary gene list go to the View menu and select Remove Second ary Gene List Show All Genes At any time in any display mode you can click the Show A11 Genes button to revert to a display of all genomic elements Viewing Data 4 3 Finding and Selecting Genes Finding and Selecting Genes The Find Gene and Advanced Find Gene functions allow you to quickly find one or more genes This is especially useful
209. due to differences in the total amount of the dyes added to the sample and reference samples from one chip to another However the inten sity dependent normalization option which is actually a per gene normalization succeeds in centering all of the values on each array around 1 In addition it provides protection from dye incorporation artifacts that lead to unwanted relationships between signal inten sity and normalized expression values see Intensity Dependent Normalization on page 5 8 It is generally recommended to apply a per spot normalization using the Divide by control channel option followed by selecting the Per Chip Normalization step Both of these normalization options can be accessed by selecting Experiments gt Experiment Normal izations Region Normalization This normalization option allows you to normalize sections of a sample rather than nor malizing over the entire sample This is especially important if you used multiple arrays for each experimental point or if there is some reason you must normalize sections of an array separately from one another Region normalization is not a separate mathematical Normalizing Data 5 17 Normalization Strategies for Specific Technologies formula the way the previous normalizations discussed in this chapter are Using this nor malization means if you normalize to negative controls to positive controls or normalize each sample to itself you do not actually normalize over
210. e Out puts area of the browser There are three types of information produced by scripts that cannot be saved in Gene Spring This is because there is no way to manipulate this information within GeneSpring Boolean A simple results box appears for this output typically either true or false See Boolean on page 8 20 for details Numbers A simple results box appears containing a list of numbers for this output See Numbers on page 8 26 for details Sequence Information This information is displayed in a copy and pasteable table similar to the Potential Regulatory Sequences table in GeneSpring Knobs Knobs allow the user running the script to enter a value when the script is run For exam ple you could filter with a knob that sets the minimum normalized expression level Information for inputs outputs and knobs is entered in GeneSpring at the time you run the script Constants required for building blocks are entered on the right side of the ScriptEd itor screen in the ScriptEditor Block area To create a knob 1 Select an appropriate building block from the navigator 2 Right click in the Knobs area of the ScriptEditor Block and select Make new knob from the pop up menu 3 Select the appropriate variable type The available types are nteger Positive Integer Number Positive Number YesorNo Measurement Type Percentage Correlation Name Comparison Chromosom
211. e with available data are shown in the class table Any of these lists of genes can be exam ined by selecting a table cell with a Gene count and clicking on Make Gene List of Selected Cell References for the Classification Inspector Calinski T and Harabasz J 1974 A dendrite method for cluster analysis Communica tions in Statistics 3 1 27 Gordon A D Classification 2nd Ed Monographs on Statistics and Applied Probability 82 Chapman amp Hall CRC Boca Raton 1999 4 24 Viewing Data Display Options Display Options Linked Windows Allows you to select one gene or gene list in two windows simultaneously Simply select a gene or gene list in one window and the same gene or gene list is automatically selected in the other window To create a linked window go to the File menu and select New Linked Window Split Windows Another interesting way to view classifications is with the Split windows function The Split windows feature allows you to see multiple sets simultaneously in the main Gene Spring screen Full GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists FC Experiments HC Gene Trees 1 i HOI Condition Trees 0 20 40 60 80 100 130 16 160 H Classifications 4 cluster K Means for Yeal Set 1 1 877 genes 1 877 in list I 1 6x5 SOM for Yeast cell cyc 10 LF Chromosome Number FC Pathways Ha Array Layouts
212. e Edit Format View Help L14754 X79882 D78579 M31630 304111 etc Figure 2 1 The Master GeneTable as tab delimited text Once the table is formatted open to GeneSpring and select File gt New Genome Installation Wizard P New Genome Installation Wizard DER Welcome to the Gene Spring Genome Installation Wizard Welcome to the GeneSpring Genome Installation Wizard Genome Data Directory Overall Genome Properties Please enter the common name ofthe organism under study This isthe name that will appear in New Genome selection of GeneSpring s File GenBank Data Files menu Organism name Master Gene Table Genome Sequence File Additional Genetic Elements Links to Web DataBases zg o o o o L1 o o O Miscellaneous Settings o Finished Please enter the organism name Figure 2 2 The New Genome Installation Wizard 3 Answer the series of questions such as the genome name i e Human custom whether the genome is circular or a series of linearized chromosomes etc When you reach the Master Gene Table screen click Browse and select these tab delimited text file you created in step 1 Continue the genome installation process Once you are finished the new genome can be selected using File gt New Genome or Array Click on any gene in the Genome Browser to invoke a Gene Inspector window from which resident gene annotation can be viewed If the annotations you imported include GenBank acc
213. e Inspector window is the name of the gene and an area for notes The table in the upper right corner displays the normalized control and raw val ues as well as the t test p value and flag for each measurement In the center of the win dow is a browser showing a graph of the gene across all conditions At the bottom of the window from left to right are correlation functions lists containing your gene and links to databases Accessing the Gene Inspector Window There are three ways to access this window Double click a gene this may be easier when you zoom in e Select Edit gt Find Gene and enter the name of your gene Press Ctr1 I when one or more genes are selected 4 10 Viewing Data Inspectors Gene Inspector YDRO48C Experiment Yeast cell cycle time series no 90 min Map complement 52079 553383 ess perma cont raw ist p vewe rins PE SCD 50007457 4 time 0 minutes 0 2 time 10 minutes 0 86487126 3 ime20 minutes 1 0 48445508 4 time 30 minutes 0 15195101 S time 40 minutes 1 l 0 25146654 6 time 50 minutes 0 0 66036314 Other Notes z time 60 minutes B 0 6577452 18 time 70 minutes 0 0 8097949 9 time 80 minutes 1 0 75044656 40 time 100 minutes 0 747 0 58561176 44 time 110 minutes 12 0 5729164 10 20 30 40 50 60 70 80 100 110 120 130 140 e 2 E Tools Lists Containing YDRO48C Minimum correlation 0 95 EHE Gene Lists Search SGD Locus P
214. e None Determine the number of objects Group Lists in the specified array and output the result Count Sequences Array of None Determine the number of objects in Group sequences in the specified array and output the result Clustering Open Scripts gt Basic Scripts gt QC amp Analysis gt Clustering Name Input Knobs Description Build Condition At least one gene Correlation Outputs a condition tree branches Dis card bad Sep aration ratio Minimum dis tance Do auto matic annotation Use standard Tree list and one type Separa experiment inter tion ratio Mini pretation mum distance Build Gene Tree One gene list Similarity mea Outputs a gene tree NOTE See one experiment sure Merge Using the Remote Server on page 8 interpretation similar 5 for important details Explained Vari Atleast one clas None Computes the proportion of variation ation sification one in an experiment interpretation experiment inter explained by a classification and a pretation and gene list Output is a number between one gene list zero and one inclusive i e 0 14567 is 14 567 explained variability Find Predictor One experiment Parameter Outputs a list of genes that are good Genes interpretation and name number at predicting a given parameter in an one gene list of genes experiment Scripts and External Programs 8 27 Script Building Blocks
215. e Number Gene Annotations 4 If desired enter a default value for the knob in the Default Value field in the Notes area 8 14 Scripts and External Programs Using the ScriptEditor A Sample Script Scripts are very simple when stripped of their fancy verbiage and icons For example you might want to examine your data as follows Starting data Experiment Split experiment into conditions Filter on Filter on confidence confidence Take the genes in either list Gene List Analysis results Figure 1 1 A simple flow chart of a script Scripts and External Programs 8 15 Using the ScriptEditor In GeneSpring s Script Editor it would look like this amp ScriptEditor Test Script File Edit Help Numbers ET This Script Hod QC amp Analysis Inputs gii Experiment Name TestScript C1 clustering taba Correlations Notes EH Filter Gene Tools In this box you can type notes HE Filter on Annotat Split Interpretation describing the purpose ofthe f8 Filter on Confide script You can also cut and paste text from other sources B Filter on Confide HE Filter on Data Fil I 88 Filter on Data Fil HE Filter on Error C HE Filter on Error Ir HE Filter on Express f8 Filter on Flags HE Filter on Fold Ch E Filter on Gene Li H Filter on Gene Li Filter on Gene Li Regulatory Sequen Statistical Analysis 1 EHO Standardize Inputs E Examples
216. e ORF by using negative numbers for the bases 5 Enter the length of the oligonucleotides to search for 6 Enter the number of single point discrepancies allowed This refers to a maximum number of mismatches allowed For example if you specify one single point discrep ancy ACGCGAT satisfies a search for ACGCGTT 7 Enter the range of base gaps in the exact middle This refers to the size of an allowable hole in the middle of the sequence allowing you to look for sequences such as ACGnnnCGT which is biologically relevant due to loops and non binding areas The gap must be in the exact middle with the longer side of odd sequences appearing before the Ns It does not count towards the sequence length specified hence ACGnnnCGT would be returned as an oligonucleotide of length 6 8 Select whether the sequence is relative to the sequence upstream of other genes or rela tive to the whole genomic sequence The first option is far more common The Probability Cutoff textbox indicates the level of significance P value needed for an oligomer to be listed in the results You can change this value 9 Specify whether to perform the operation locally or on a remote GeNet server Analyzing Data 6 20 Working with Gene Lists 10 Click Start The button becomes a Stop button The progress bar lengthens as your search progresses For very large genomes or complex search parameters this opera tion may take a few minutes To enter a specif
217. e a selected gene from the New Gene List table Remove All Remove all genes from the New Gene List table Inspect View the selected gene in the Gene Inspector For more information on the Gene Inspector see The Gene Inspector on page 4 10 Show Annotations Show or hide annotations from a gene list Filtering Methods Filter on Gene List To filter on gene list use the GeneSpring navigator to locate the desired gene list and select it in the list All genes in that list appear in the Filter Results table Filter on Gene List Type a List Filter on Annotation Show All H Gene Lists HO EC PCA Yeast cell cycle time series no 90 min HC PIR keywords HHC Simplified Gene Ontology 3 all genes 8 all genomic elements 8 ACGCCT in all ORFs like YMR199w CLN1 0 95 Figure 6 3 The Filter on Gene List tab Analyzing Data 6 3 Creating and Editing Gene Lists Type a List This method allows you to manually enter a list of genes To enter genes simply click in the gene list box type a gene s Common Name Systematic Name Synonym or GenBank Accession Number and press Enter You can also use copy and paste to enter one or more genes to this list If the gene is found it appears in the Filter Results table If the gene is not found it is col ored in red To clear your entries click Reset This removes all entered genes from both the list of genes
218. e at http www silicongenetics com cgi SiG cgi support stdatt smf These include sets of attributes tailored for various applica tions including MIAME standards Changing Experiment Attributes You can add edit or delete sample attributes through the Edit Attributes window Edit Attributes DER Please select values for sample attributes Attribute Name Diseased Normal File Name Attribute Units New Attribute no no 1 Mergen Rat 1 ORFs txt diseased Mergen Rat01_ORFs tt ok Cancet Help Figure 3 21 The Experiment Attributes window Import Attribute You can import a attribute from another experiment or from a list of sample attributes available for any of the samples in the current experiment You can also convert a parame ter into a sample attribute To import a attribute 1 Click Import Attribute The Import Attributes window appears 2 Select the attribute or attributes to import To select a sample attribute to import as a attribute click its name in the list at the top of the screen To select all attributes in this list click Select A11 To clear your selections click Clear All Working With Experiments 3 35 Sample Attributes To import attributes from another experiment find the desired experiment in the navi gator and select it The attributes associated with that experiment appear in the Attributes from Selected Experiment list Select the desired attributes from the list 3
219. e axes To modify the function as well as the appearance of the axes l Select View Display Options orright click anywhere in the genome browser and select Display Options The 3D Scatter Plot Display Options win dow appears Click the X Axis Y Axis orZ Axis tab In the Display Options navigator select the gene list experiment interpretation or con dition to use on the selected axis Click the X Y Z Axis Value pull down menu The list of options includes only those that are appropriate for the type of data object you selected Choose a graph mode for the specified axis The three options are linear logarithmic and fold change Note that the fold change option is only available if you are looking at normalized data from an interpretation or a condition To adjust the axes so that all measurements are visible check the Scale Axis to Show all Values box The upper and lower bounds are adjusted automatically Alternatively you can manually set the upper and lower bounds to values of your choosing To automatically choose tick spacings check the Automatic Tick Spacing on Axis box To set the tick spacings manually leave this box unchecked and enter the major tick interval as well as the number of minor ticks For more information about setting tick spacings see The Vertical Axis on page 4 27 Adding Lines You have the option to draw lines that help distinguish distinct groups of data points
220. e epi e be Steet 1 22 Computation oe ure ue eke em Rie em ep AER aes 1 23 Miscellaneous nro ee oer vp eb rS utens deg RR 1 23 2 Creating Genomes The New Genome Installation Wizard 0 0 0 0 000 cece eee ee 2 2 Creating a Genome from Experiment Data 0 0 0005 2 9 Data Format ee aces nle ea Meee SRG RoR Gah ated ES PH 2 10 Layout Parameters 0 0 eee ce hh 2 12 The layo t file RT DU S eee go Ss e tS 2 12 Examples of ayout files for Arrays 0 0 00 cc eese 2 13 Renaming and Deleting Genomes 0 00 ce cece eee eee 2 14 Renaming a Genome 0 cece e nee 2 14 Deleting a Genome 0 ia a a a 2 14 Working With Experiments Importing Experiment Data 0 0 000 00 3 2 Memory Use for Experiment Loading 200005 3 2 Loading an Experiment 00 0 0 c ccc eens 3 3 Using the Column Editor 0 0 0 s su ccc cece eee nee 3 9 Default Column Assignments of Known Products 3 12 Creating New Experiments 0 0 e cece eee eens 3 16 Copying and Pasting Experiments 0 002 cee eee eee eee 3 17 Preparing to Paste lisse 3 17 Common Mistakes in Pasting llle 3 19 Pasting an Experiment into GeneSpring 3 20 Copying an Experiment or a List Out of GeneSpring 3 20 Default Normalizations noaa 0000s 3 21 The Sample Manager 0 0 0 0 c ec ccc nes 3 2
221. e information about this window see Creating and Editing Gene Lists on page 6 2 Creating Expression Profiles The Creating Expression Profiles function allows you to draw a pseudo gene to represent a hypothetical expression pattern This function is useful if you have some idea of what gene expression pattern you are looking for as you can simply draw a pattern and look for genes that behave similarly You must be in Graph view to create an expression profile Double click the expression profile to open the Gene Inspector for that gene To create an expression profile 1 Select Tools gt Draw Expression Profile A new gene appears on the screen at the normalized median of your data usually 1 0 2 To change the shape of this gene click on the gene and drag while holding down the control key On Macintosh systems Option click To save an expression profile 1 Double click the expression profile to open the Gene Inspector 2 Click Save Expression Profile 3 Name the new profile and click Save Your new expression profile appears in the Expression Profiles folder in the navigator To make lists from expression profiles 1 Double click the expression profile to open the Gene Inspector 2 Click Find Similar A New Gene List window appears with a list of similar genes and lists 3 Name the list and click Save Your new list appears in the genome browser and in your Gene Lists folder Analyzing Data 6 15 Worki
222. e list of predictor genes The list of predictor genes is assembled using Fischer s exact test In this method all the measurements for a given gene are ordered according to their normalized expression lev els For each class parameter value the predictor places a mark in the list where the rela tive abundance of the class on one side of the mark is the highest in comparison to the other side of the mark The genes that are most accurately segregated by these markers are considered to be the most predictive A list of the most predictive genes is made for each class and an equal number of genes lowest P value using Fischer s exact test are taken from each list To make a prediction the class predictor uses the k nearest neighbor method It selects k number of samples near as measured in Euclidean distance the unclassified sample and for each class computes a P value that is the likelihood of finding the observed num ber of this class within the neighborhood members by chance given the proportion of the classes in the training set The class with the lowest P value is assigned to the unclassified sample You can specify a P value cutoff or threshold such that if there is not sufficient evidence in favor of a particular class no prediction is made The P value cutoff is a ratio of the probability that the prediction was made by chance for the two classes If you have more than two classes the ratio is the lowest P value divide
223. e list you created appears in your Gene Lists folder Making Lists with the Venn Diagram A Venn Diagram allows you to quickly visualize genes common to more than one gene list You can also find genes present only in a particular list The gray area behind the cir cles represents the Venn Diagram universe the selected gene list Genes in the selected list that are common to gene lists represented by the Venn diagram circles appear as num bers in those circles For information about creating and filling Venn Diagrams see Color by Venn Diagram on page 4 32 To make a list with a Venn Diagram 1 Right click the area of the Venn Diagram in which you want to make a list 2 Select an option from the pop up menu A New Gene List window appears If you click in an area where two circles overlap you have the following options Make list of these genes list genes in the immediate geometric area Make list of genes in both lists list genes common to the two circles i e the intersection Make list of genes in either list list all genes in the two circles i e the union If you click in an area where three circles overlap you have the following options Make list of genes in all lists list genes common to the three circles i e the inter section Make list of genes in any list list all genes in the three circles i e the union If you click a non overlapping gray area you can make a list of genes in
224. e or more columns from a selected file You can perform only one search at a time The selected file must have at least two columns a column for gene identifiers and a column of some other type of data The Match Gene Identifier To pull down menu lets you specify which type of term you are using to identify each gene For instance if you chose Systematic Name or Common Name or Synonym GeneSpring looks for the specified identifier in any of those three columns in any of your master table of genes files The filter then returns a list of the genes that have matching identifiers in the selected field and pass the filter in the Search Criteria fields The same search criteria are applied to every selected column therefore all selected col umns must contain the same type of information To perform multiple searches of columns containing different information you must apply multiple restrictions to your data one for each type of information amp Filter on an Arbitrary File EHS Gene Lists EC all genomic elements H_ PCA Yeast cell cycle time Choose File Arbitrary bt PIR keywords E Simplified Gene Ontology H all genes Please click on the checkbox in the Search row to select which column s to filter on all genomic elements I ACGCGT in all ORFs _ Gene Identifier Search Search Hi like YMR199wW CLN1 0 line 30 Column Titles Column Name ID g Normalized Data betweer line 31 data 1 control1a plant control line 32
225. e that only axes which represent experiments interpretations or conditions are available for coloring To color genes by an experiment that is not represented by either axis click Other Experiment select an experiment in the navigator and click Set Experiment Viewing Data 4 51 Tree View Tree View The Tree view allows you to view the results of hierarchical clustering in the form or a mock phylogenetic tree or dendrogram In such a tree genes having similar expression patterns are clustered together Full GeneSpring Human Oncogenes Genes all genes DER File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists Demon Keywords HF H Experiments HH Demonstration Experime H Gene Trees L Demonstration Experime Condition Trees ch Demonstration Experime Classifications Ha Pathways Ha Array Layouts C Expression Profiles HO External Programs Bookmarks HA Scripts n l ta Demonstran Expression Selected Gene Tree Demonstratio Colored by Demonstration Experiment Selected Condition Tree Demonstratio Gene List all genes 159 Zoom F Sut Magnification 1 Figure 4 28 Tree View The genome browser above is displaying a gene tree The genes are the columns of col ored rectangles to the right of the tree structure displayed in green Similarly colored genes tend
226. e the appearance of data points and data labels To modify these features l Click Display Options orright click anywhere in the condition scatter plot window and select Display Options 2 Click the Features tab 3 To modify the size and shape of the points choose from among the options in the Style and Size pull down menus 4 There are five options for labeling the plot Show Condition Names Displays the name of each gene to the lower right of each point These names become unreadable if more than 100 genes are visible in the current gene list and magnification Show X Axis Label Displays the parameter that is graphed on the X axis Show Y Axis Label Displays the parameter that is graphed on the Y axis Show Z Axis Label Displays the parameter that is graphed on the Z axis Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Coloring Coloring in the scatter plot view is more complicated than in other views because the color of each gene can be derived from the data in any axis In other views the color of the gene is usually linked to the data plotted on the vertical axis In addition the scatter plot allows you to color genes based on a fourth experiment or condition that is not plotted on either axis To modify the way data points are colored l Select Vie
227. e the colors applied to general display options Full GeneSpring Preferences Data Files Database Browser System GeNet Gene Labels Miscellaneous Colors Firewall Computation Standard Colors Upregulated JAN Change Normal coa Change Downregulated JAN Change Structure JAN Change Group Colors 1H ch D BH sos sm Bl 1 B 2E 8 D 14 38 S 16 Bl sz D 13 3 96 D 20 2 B 2 Bl 2 B 2 om 26 Bl 27 I 25 B o9 B 30 B Hm so 36 HE 27 38 D 39 7 40 7 44 7 D apr a rasrr 46 3 47 7 4 amp A 4 7 50 E sE og mmm se MEE 57 EH cs HE co E ec NN Background cue Change Selected a Change rec coo B Change Presets Select Color Scheme Save as custom color scheme Selection Group d o Change Defaults OK Cancel Help Figure 1 5 The Colors section of the Preferences window Upregulated Color The Upregulated Color is the color used to display genes greater than or equal to the High Expression value selected for the current color bar Normal Color The Normal Color is the color used to represent genes having a nor malized expression value of one This is the only setting for which you can specify no color Downregulated Color The Downregulated Color is used to display genes less than or equal to the Low Expression value selected for the color bar Structure Color The Structure Color is used for the ConditionLine
228. each physical SQL database to which you will connect The name element specifies the database name for any lt Database Query gt tags that occur within the lt Database gt element If you are loading data from flat files you may not use the lt PhysicalDatabase gt element Contents lt UserName gt lt Password gt lt URL gt lt Prefetch gt Attributes name Usage lt PhysicalDatabase name dbname gt lt PhysicalDatabase gt Notes Required to retrieve files from a SQL database lt UserName gt This element specifies the username for logging into the sample database Contents plain text Attributes n a Usage UserName Ndege MacKenzie lt UserName gt Notes Required for lt PhysicalDatabase gt Password This element specifies the password for logging into the sample database Contents plain text Attributes n a Usage Password dbPassword Password Notes Required for lt PhysicalDatabase gt URL Specifies the physical address of the database or directory from which you will retrieve sample data Contents plain text Installing from a Database A 11 Connecting your Database to GeneSpring Attributes n a Usage URL jdbc odbce database URL Notes Required for PhysicalDatabase Prefetch This is an optional rarely used element that allows you to specify how many rows to retrieve from the database during the prefetch process This may be useful in certain c
229. each sample but rather perform the normalization over each region Hence the formulas for these normalization options become Normalizing to Negative Controls for a Region the control strength of gene A in region Y of sample X the median signal of the negative controls in region Y of sample X Normalizing to Positive Controls for a Region the control strength of gene A in region Y of sample X the median signal of the positive controls in region Y of sample X Normalizing Each Region to Itself the signal for gene A in region Y of sample X the median signal of the genes region Y of sample X Dealing with Repeated Measurements Single Data File Occasionally the raw experimental data in the data file for your sample has more than one line devoted to a particular gene This may be because you did the sample twice or because you did the sample once but took the measurements twice If the same gene name is reported multiple times on different horizontal lines in your data file GeneSpring auto matically considers the measurements repeats and averages the signal strengths together GeneSpring reports the average and keeps track of the minimum and maximum values for each gene but it cannot access the particular values falling between the minimum and maximum values The formula for averaging a repeated gene is the signal strength of gene A2 the signal strength of gene An N This process is repeated fo
230. ect it from a directory Ifyou are not using a GenBank or EMBL file leave the No radio button selected Click Next to proceed to the next screen 6 The Master Gene Table screen appears This screen appears only if you are not using a GenBank or EMBL file as your data source On this screen enter the name of your master gene table Type its complete path and filename in the text box or click Browse to select it from a directory You cannot pro ceed until you have entered the filename of a valid master gene table on this screen Note Your master gene table must be in a name list name function SGD or mapped for mat For more information on these date formats see Data Format on page 2 10 Once you have entered the correct information click Next Creating Genomes 2 3 The New Genome Installation Wizard 7 The Genome Sequence File screen appears This screen appears only if you indicated on the Overall Genome Properties screen that your genome has been sequenced and you are not using a GenBank or EMBL file On this screen you specify where Gene Spring should look for the sequence data Place your cursor in the Enter Genome Sequence File Name box and type the complete file name and pathway or click Browse to select the file from a directory You cannot proceed to the next screen until you have entered a file name Once you have entered the correct information click Next 8 The Additional Genetic Elements screen a
231. ected the Merge Files window appears Proceed to step 7 If not proceed to Step 8 6 In the Select Corresponding Files window you can specify which signal file corre sponds to which control file This step applies only to Imagene files 3 4 Working With Experiments Importing Experiment Data Select Corresponding Files lol xl Select pairs offiles which contain corresponding signal and reference values Signal Reference Signal Reference cvs FILE TXT CY3 FILE TXT Add Pair gt Remove Pair Guess The Rest Clear Guess Previous Next Cancel Help Figure 3 3 The Select Corresponding Values window To do this a b e f Select a filename in the left column Select the corresponding filename in the right column Click Add Pair The pair you specified appears in the Signal Reference list To have GeneSpring automatically select the rest of your files click Guess The Rest If GeneSpring guesses incorrectly click CLear Guesses To remove a pair select it in the Signal Reference list and click Remove Pair When you are done click Next and see Step 8 If you need multiple chips to cover one sample such as the Affymetrix Mu_u74 or Hu chip sets the Merge Files window allows you to define all the files needed for each sample This screen does not appear if it is not necessary amp Import Data Merge Files BAR If multiple files correspond to a single sample
232. ection of the genome browser 4 44 Viewing Data Scatter Plot View Scatter Plot View The Scatter Plot view is useful for examining the expression levels of genes in two distinct conditions samples or normalization schemes For instance you can use the scatter plot to identify genes that are differentially expressed in one sample versus another A scatter plot can also be used to compare two values associated with genes in two gene lists Such associated values might include the relative contribution of principal components as deter mined from principal components analysis or two similarity scores from the Find Similar function in the Gene Inspector Note Genes with no data cannot be displayed in this view Full GeneSpring Human Oncogenes Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists J H Experiments coce sess arD emonstration Experime HC Gene Trees HC Condition Trees Classifications 100 H Pathways 1 Hy Array Layouts H Expression Profiles HOJ External Programs HCJ Bookmarks Ha Scripts US Expression control T FTIIT 17 T 10 100 10000 X axis Demonstration Experiment De Colored by Demonstration Experimen Y axis Demonstration Experiment De Gene List all genes 159 Show 3enes Zoom Ou Zoom Jut Magnification 1 Figure 4 26 The Sca
233. ed gene if only one is selected or the number of selected genes if mul tiple are selected Secondary Gene List Name If a secondary gene list is being displayed this displays the name of the secondary gene list and the number of genes in this list Condition or Gene List Sorted By In the Graph by Genes View displays the name of the gene list or the name of the condi tion experiment and interpretation used to sort the genes on the X Axis Color Color by Expression This option colors genes according to their normalized expression values and trustworthi ness To color your genes by expression select Colorbar gt Color by Expres sion or select the Expression option from the Coloring tab in the Display Options dialog Expression The vertical axis of the colorbar represents expression levels on a continuous scale Using the default colors red indicates overexpression yellow indicates average expression and blue indicates underexpression Genes are colored by their expression level in the selected condition as indicated by the condition line If you have specified the parameter on the horizontal axis to be continuous expression levels in between conditions are interpolated Trust The horizontal axis of the colorbar indicates the degree to which you can trust your data where dark or unsaturated colors represent low trust and bright saturated colors represent high trust GeneSpring uses the following guidelines to automati
234. ed ratio expression and translated as necessary to the interpretation mode by use of the delta method Results of the variance components analysis are used to estimate standard deviations and standard errors according to the grouping of samples into conditions as specified by the experiment interpretation Two different types of interpretation affect the assumed context of the calculation Working With Experiments 3 46 Cross gene Error Models Single sample interpretation If all conditions contain only one sample for instance the All Samples interpretation precision calculations are based solely on the esti mated within sample measurement variation The error bars standard deviations and standard errors represent the variability of all possible measurements on this specific sample Multi sample interpretation If at least one condition contains multiple samples precision calculations for all samples are based on the combined within sample and between sample variation and error bars standard deviations and standard errors rep resent the variation of measurements of samples representing the population of all pos sible samples in the condition In a multi sample interpretation if no replicate samples are available for a specific con dition then no error calculations are made and no error bars are shown since there is no information available on the variability of that condition References Milliken G A and Johns
235. ed the desired filter you can save it for future use by clicking Save Filter or Save As Script For more information on saving filters see Saving Fil ters on page 6 67 Creating Boolean Filters Control Signal between 10 Yeast cell cycle time se AND Filter on Fold Change Yeast cell cycle time se AND Figure 6 23 The Restrictions table Think of each row in the Restrictions table as a single gene list There are no priorities between statements so without parentheses to group statements the order is assumed to be left to right Use the pull down parentheses menus to group restrictions together The AND OR pull down menu tells GeneSpring how to combine the grouped restrictions The NOT pull down menu tells GeneSpring not to use the genes from the selected filtering step Saving Filters You can save your filter either as a saved filter or as a script Saved Filters Saved filters include all of the inputs to the filter and any associated information includ ing each restriction and its settings They can be accessed in any genome When a saved filter is opened in a genome other than the one in which it was created the genome data objects gene lists experiments etc appear blank or undefined You must define these fields within the new genome before running the filter Filters that require data objects to be defined are displayed in red in the Restrictions table Analyzing Data 6 67 Advanced Filtering A saved f
236. edian deviation from 1 0 The goal in this step is to remove outliers when replicates are being used and to disregard genes whose high or low expression level is the result of biological activity In the absence of replicates the working assumption is that the vast majority of the genes do not change over the conditions in the experiment and thus devia tion from one represents error in a gene whose expression level changes little over the course of the experiment Then an iteratively reweighted linear regression of variation or squared deviation versus squared control strength is fitted to estimate the parameters Estimation of the 2 level variance components model is done by the method of moments In order to eliminate negative estimates of variance components within sample variation is taken as a lower bound on total between sample variation Different sources of informa tion in the analysis are weighted by their appropriate statistical degrees of freedom Preci sion estimates based on replicate genes or samples are assigned degrees of freedom equal to the number of replicates minus one User supplied precision values if available are assigned 1 degree of freedom Cross gene error models if used are assigned an equal number of degrees of freedom as the direct variability estimates for that gene Between sample analyses are done according to the interpretation mode ratio log fold Within sample variability is calculated in terms of normaliz
237. eing colored in many different ways because the display presents expression levels of the genes through the entire exper iment These are the only views in which you may choose to color the genes by a second ary experiment This means the color of each gene line graphed correlates to the expression level of that gene in a different experiment at the point in the second experi ment marked by the secondary scroll bar 1 From the navigator open the Experiments folder by clicking on its icon 2 Position your cursor over an experiment not the one currently displayed you want to use for coloration 3 Right click and select Set Secondary Experiment from the pop up menu The coloring scheme of the genome browser is shown in the colorbar on the right There are two versions of the animation controls in the Experiment Specification Area Changing the Default Colors You can change the colors used to display the genes This does not affect interpretation of your data but it can help you to make genes more visible on screen or make it easier to print screen shots 1 Select Edit gt Preferences and click the Color tab or click the Change Colors buttononthe Display Options gt Color tab 2 Select the type of information whose color you want to change and click Change 3 Adjust the sliders until the color you want is displayed in the preview window at the top of the Structure Color window 4 Click OK For more details about the oth
238. en two samples or condi tions Analyzing Data 6 55 Basic Filters P Filter on Fold Change EH Gene Lists Ho EC Choose Gene list gt gt like YMR199W CLN1 0 95 H PCA Yeast cell cycle tin Choose Condition 12 time 0 minutes Yeast cell cycle time series no 9t HOI PIR keywords Choose Condition s 2 gt gt time 100 minutes Yeast cell cycle time series no Add Remove H Simplified Gene Ontolc ife Choose Data Type X all genomic elements Choose Comparison Condition 1 or Condition 2 I ACGCCT in all ORFs like YMR199y CLN1 H Experiments 10 74 out of 117 genes pass filter I Interactive Update ast cell cycle time s a Default Interpretatio Gii time 0 minutes G time 10 minutes G time 20 minutes HGT time 30 minutes GiT time 40 minutes Ge time 50 minutes Ge time 60 minutes Ge time 70 minutes Fold Difference 2 View Scatter Plot ice must appear in at least Save Close Help Figure 6 13 The Filter on Fold Change window Condition 2 time 100 minutes ratio T T T T I T 1 T T 0 2 03 04 0506 08 4 2 3 Fold Difference E 2 5 E o o cz 2 Z S5 o 1 Select the first sample or condition and click Choose Condition 1 2 Select additional samples or conditions To specify a single sample or condition select it from the navigator and click Choose Conditio
239. en you click Browse the words Dummy Name leave alone appear in the File name field in the Save in window This is expected behavior Do not change this text If you accidentally click on a filename in this window the name of that file replaces the dummy text and clicking Save generates an error message If you get this error message click Yes This does not replace the file you clicked It simply enters the correct directory name in the Specify Directory box When you click Next the Overall Genome Properties window appears 4 Specify genome properties Ifyour organism has been sequenced and you have a file containing the full sequence select Yes in the first box If not leave No selected Ifyour organism is a circular genome such as a bacterium plasmid or virus select Yes in the second box This tells GeneSpring to display your genome as a circle in the physical position display If your organism does not have a circular genome leave No selected When you have made your selections click Next The GenBank Data File screen appears 5 Specify whether you are using a GenBank file as your data source and if so the name of the file If you are using an EMBL file select Yes as you would for a GenBank file Ifyou are using a GenBank or EMBL file select Yes You are prompted to enter the filename You cannot proceed until you have entered a filename Type the com plete path and filename or click the Browse button to sel
240. ene List Inspector You can view the contents of a gene list and its creation method using the Gene List Inspector window This window is especially useful for learning about lists identified using the Similar List function 4 20 Viewing Data Inspectors 5 Gene List Inspector ACGCGT in all ORFs 556 genes DER CGCGT in all ORFs Eric Boyer Research Group Silicon Genetics Organization Silicon Genetics fnord 100 Wed Aug 15 05 58 30 PDT 2001 Use as Standard List Notes 0 01 time minutes f 0 1020304050607080 100 120 140 160 Normalized Intensity log scale Gene List Similar Lists Associated Files Number of Genes 556 Configure Columns Distance promoter acgcgt upstream heat shock transcription factor Probable phosphopanthethein binding protein member ofthe stationary phase induced gene family Induced in stationary phase Save to File PrintList Copyto Clipboard Find Regulatory Sequences Edit Gene List OK Cancel Help Figure 4 13 The List Inspector window The history of the selected gene list is displayed the upper left corner of the window You can edit the information in these fields In the upper right corner of the window is a browser graphing your list Right click on the graph for a menu of options See Using the Genome Browser on page 4 2 for informa tion on browser options Below this section there are three tabs containing a variety of options Click a tab
241. eneSpring URLs were specified in the Gene Inspector Window like this GeneHypertextLinks f linkname http www example com amp gene amp org Hs where is a symbol specifying the source of the gene annotation to query with to query with GenBank locus column 10 of master gene table S to query with common name col 2 to query with systematic name less anything after a dash If none of these symbols appears before the link name the button automatically queries the designated database with the full systematic name These names are added to the end of the specified URL To drop the chosen gene identifier within the URL into a specific spot mark the spot with a semicolon GenBank or EMBL Files If you use a single GenBank file to describe a genome you need not use a master gene table and therefore do not have to enter any of the information discussed in Data Format on page 2 10 You also do not need a separate file to contain the sequence data the files for sequence data are described in Sequence Data on page 2 7 2 6 Creating Genomes The New Genome Installation Wizard The GenBank file can be downloaded directly from GenBank if you open a web browser to the URL of the organism you are installing For example ecoli gbk is a9 5 MB file from the URL ftp ncbi nlm nih gov genbank genomes bacteria Ecoli This URL is usually the same for all of GenBank s bacterial genomes with the name of the o
242. ent licenses Desired Memory Use Allows you to set the amount of RAM GeneSpring attempts to use If this value is set too high with respect to total available memory unnecessary disk caching occurs and performance will be slow Disk Cache Size Specifies the amount of hard disk space GeneSpring uses to tempo rarily store HTML pages accessed by the GeneSpider or by other internet based search functions Silicon Genetics recommends that you set this value to 10 of your available disk space In GeneSpring 6 0 experimental data is loaded into the disk cache instead of into system RAM GeneSpring now loads only the data currently being used into mem ory This enables GeneSpring to handle much larger experiments containing any num ber of samples Cached Internet Resources Expire After Specifies how long GeneSpring caches copies of Internet resources for quicker access Number of Processors Specify the number of processors in your computer This allows several types of analysis including k means build gene trees promoter search and predict parameter values to be used most efficiently GeNet On this tab you can specify the default GeNet server to connect to and enter the addresses of other GeNet servers to which you want GeneSpring to have access To have GeneSpring connect automatically to the default GeNet server at startup check the Login to GeNet at Startup box Select the default GeNet server from the pull down menu T
243. ents gt Experiment Interpretation The Change Interpretation win dow appears 4 Click the box marked Use Cross Gene Error Model Working With Experiments 3 45 Cross gene Error Models 5 Click Save to save as part of your current interpretation or Save As to create a new interpretation Technical Details The two component model for estimating variation from control strength is known as the Rocke Lorenzato model The two components are an absolute error component that domi nates at low measurement levels and a relative error component that dominates at high measurement levels The formula for the error model for raw pre normalization expres sion levels can be written as Oka W Ja b S where 6 gaw is the measurement standard error of the raw expression data S is the mea surement level control strength and a and b are the fitted coefficients of the model Expressed in terms of the normalized expression levels which are the result of dividing raw expression levels by control strength the standard errors can be written as o 2 NORM p s2 Before fitting the error model the genes are ordered by their control strengths A median variance and median control strength is calculated for each non overlapping set of eleven genes If replicates are used this variance is the standard error of the samples in the cur rent condition If the deviation from 1 option is selected error is approximated by using the m
244. er requests Build Homology Tables to save UniGene cluster IDs it replaces the UniGene col umn with its results deleting any obsolete entries from the column This contrasts with the behavior of the GeneSpider where overwrite will not replace existing entries with an empty entry Analyzing Data 6 29 Annotation Tools Gene Ontology information on the Silicon Genetics Mirror server is obtained from the LocusLink database The same information cannot be provided through Update Annota tions from LocusLink because the LocusLink web page format does not indicate the orga nizing principles of biological process molecular function and cellular component Genome Databases Silicon Genetics Update Annotations from Silicon Genetics retrieves annotations from the Silicon Genetics Mirror server at info sigenetics com The server downloads the complete databases for GenBank RefSeq LocusLink and UniGene from NCBI It returns the same annotations as the GeneSpiders that access GenBank LocusLink and UniGene depending on the annotation sources chosen by the user in the Update genome from Silicon Genetics win dow In addition the Silicon Genetics GeneSpider fills in GO biological process GO molecular function GO cellular component and RefSeq identifier fields from the Locus Link database and the UniGene cluster ID from the UniGene database GenBank Update Annotations from GenBank retrieves information from the GenBank and RefSeq data
245. er options in the Preferences window see Setting Prefer ences on page 1 18 Viewing Data 4 35 Blocks View Blocks View This view displays a rectangle for every gene in the active genome ordered by trust To choose the Blocks view select View gt Blocks z Full GeneSpring Human Oncogenes Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists HE Experiments E NE NND RD NN NON RN ER DER ES Lt SS SOUS GE E E E E E E NY 8 Ds 0 Gene Trees icri PET TTT Tet TT HO Classifications LIII n ee eee eee Hoa sie GG GG get Al I5 t a E MM eG Sg ts FC Exfemal Programs eS GG TTT T Bookmarks pM II ggg Eee eee eee ona LL ILS P131 DIO DIDI Expression Colored by Demonstration Experiment Default Interpretation Gene List all genes 159 nes Zoom Ou Zoom F t Magnification 1 Figure 4 20 The Blocks View Blocks View Display Options The following display options are available in blocks view Features The available options for this view are listed below Coloring See Color on page 4 30 Legend See Legend on page 4 28 Features The Features panel of the display options window contains a column of check boxes that allow you toggle on or off certain items in the genome browser Color by all conditions Divides the genes into section
246. es right click on a script script primitive or external program prim itive in the navigator panel and select Inspect from the menu Best k means BEE This script tries a k means classification with 3 5 8 and 15 clusters and chooses the one with the highest explained variability goene List ge Experiment qgj Gene Classification Figure 8 6 The Properties panel for a Script Click Details for information on the script s author and creation date ka Script Best k means DER r Object History a Name Best k means Author Silicon Genetics Research Group Silicon Genetics Organization Silicon Genetics Application GeneSpring 1 0 Created Tue Sep 18 06 15 46 PDT 2001 Note This script tries a k means classification with 3 5 8 and 15 cl Contents Script Version 0 Last Modified 0 Identifier begaz 290 Previous Identifiers scripts 1 Figure 8 7 The Details or History panel Click Edit to view the Change Information panel You can modify the information in the white text boxes Click OK to save your changes 8 12 Scripts and External Programs Using the ScriptEditor ScriptEditor Explained Variation properties DER Notes Compute the proportion of variation in an experiment explained by a classification and based on a gene list geene Classification ge Experiment geene List v os v Explained Variation OK Details Figure 8 8 The Properties panel f
247. es associated with each condition increases Define the conditions to compare by checking or unchecking the box in a given row The Check All Clear AL1 buttons allow you to check or uncheck all rows The Invert button checks all unchecked rows and clears all checked rows For Each Gene GeneSpring does the following separately for each gene Let i index over the G groups formed by distinct levels of the comparison parameter Let X be the expression values with k running over the replicates for each situation interpreted according to the current interpretation ratio log of ratio fold change Let N the number of non missing data values for each group Analyzing Data 6 35 Statistical Analysis ANOVA Nj TAN b X bethe group means and k 1 AX N i SS ES i 3 X X bethe within group sum of squares k 1 In all calculations missing values No Data or NaN are left out of the sums not propa gated If any of the N are zero drop that parameter level from the analysis and readjust G accordingly If G is not at least 2 exit p value 1 Parametric Test Variances Assumed Equal For a parametric test with variances assumed equal compute G T vx db 000 the overa mean G gt N i l G a 2 BSS 5N X X the between groups sum of squares i l s dj G 1 the numerator degrees of freedom BMS BSS the between groups mean square 1 G WSS SSS the pooled with
248. es based on a third experiment or condition that is not plotted on either axis To modify the way data points are colored 1 Select View gt Display Options or right click anywhere in the genome browser and select Display Options Viewing Data 4 47 Scatter Plot View 2 Click the Coloring tab 3 Select the type of data that is to be used for coloring from the Color data points by pull down menu For more information about the types of data that are available for coloring see Color on page 4 30 4 Choose from the Use the expression levels in the following experiment or condition radio buttons to select the axis to be used for color ing Note that only axes which represent experiments interpretations or conditions are available for coloring To color genes by an experiment that is not represented by either axis click Other Experiment select an experiment in the navigator and click Set Experiment gt gt 4 48 Viewing Data 3D Scatter Plot View 3D Scatter Plot View Full GeneSpring Human Oncogenes Genes all genes DER File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists H Experiments Ei Demonstration Experime HC Gene Trees HC Condition Trees HC Classifications FC Pathways FC Array Layouts Ho Expression Profiles Ha External Programs control HCJ Bookmarks Psy 10000 Hy Scripts i
249. es to groups For small numbers of permutations all permutations are examined If there are more than 1000 possible permutations 1000 of them are selected randomly P values are evaluated with respect to this distribution using a step down procedure as in the Holm procedure This procedure controls the FWER and the expected number of genes by chance is a This test accounts for the dependence structure between genes and should give a more powerful test than the Bonferroni or Holm procedure However the permutation pro cess takes much longer to calculate Benjamini and Hochberg false discovery rate In contrast to the above procedures the Benjamini and Hochberg procedure controls the false discovery rate FDR defined as the proportion of genes expected to occur by chance assuming genes are independent relative to the proportion of identified genes Expected number of genes by chance is a times the number of tests found significant after applying this correction There is no way to calculate this in advance so the statement about the number Analyzing Data 6 38 Statistical Analysis ANOVA expected simply says expected number of genes by chance is 100096 of the genes iden tified This procedure provides a good balance between discovery of significant genes and protection against false positives since occurrence of the latter is held to a small proportion of the list and is probably the best choice of multiple testing correction for m
250. ess outside networks Click Configure Automatically to have GeneSpring attempt to automatically detect the appropriate settings If the settings GeneSpring chooses do not allow you to reach the Internet you may need to alter these settings The following settings are available e Protocol The firewall protocol to use Options are HTTP SOCKS4 and SOCKSS Proxy Host Address The host address of the computer on which the firewall exists This can be either a fully qualified hostname i e hostname domainname com or an IP number Proxy Port Number The port on which to connect to the firewall host Welcome to GeneSpring 1 21 Setting Preferences Password Authenticate Connections Specify whether a password is required to connect to the firewall host Username The username used to connect to the firewall host if required Password The password used to connect to the firewall host if required If you are unsure how to proceed contact your System Administrator for details about your firewall System The System tab allows you to specify a number of different parameters regarding network ing and memory usage Limit Filenames to less than 32 characters Allows you to limit the length of file names This is a useful default setting for Macintosh users since MacOS does not accept file names longer than 32 characters License Server Allows you to specify the IP address of the machine that dispenses concurr
251. ession numbers select Annota tions GeneSpider to import additional gene annotations available through NCBI Creating Genomes 2 11 Layout Parameters Layout Parameters The layout file To create an array layout file in GeneSpring you need at least one file to tell GeneSpring general information about the array size shape features format name etc This file should end in the extension Layout You usually need another file describing exactly which gene goes where The format of the Layout file is a series of lines Order does not matter Each line con sists of a property a colon and a value For example property value Blank lines and lines starting with a number sign are ignored by GeneSpring The following properties are allowed in the file As always GeneSpring is case sensitive so use the capitalizations as presented here Name The name of this layout to appear in the navigator window of GeneSpring Icon optional The path of a 16 x 16 gif file to appear next to the layout in the navigator window VerticalSubArrays optional default 1 The number of rows of sub arrays HorizontalSubArrays optional default 1 The number of columns of sub arrays HorizontalPerSubArray The number of columns of dots in a sub array VerticalPerSubArray The number of rows of dots in a sub array VerticalDuplication optional rarely used When dots are duplicated vertically the number of copies H
252. estrict your search to genes containing specific sequences These sequences can include the IUPAC IUB ambiguity codes A K Y W etc Note that the symbol X is not allowed and users who want to specify a single wildcard base should use N instead Searching for NNNN therefore identifies all the genes in the genome and may result in an out of memory error 4 Click Find A list ofthe genes that match the search criteria is displayed at the base of the window The field that matched the search criteria is colored red 5 Identify the genes from the list to learn more about by clicking on the appropriate row To identify more than one gene hold down the ctrl key while clicking multiple rows Once you have highlighted the gene s of interest click one of the five buttons on the right Select to highlight and select the selected gene in the genome browser Select All to select and highlight all of the genes identified by the search Zoom amp Select to select and zoom in on a gene in the genome browser Inspect to bring up the Gene Inspector window for the selected gene Make Gene List to make a gene list from all of the genes identified by the search Searching GeNet from GeneSpring Many researchers use multiple types of arrays to study the same class of disease These arrays may even come from diverse organisms representing animal models for human dis eases etc In this instance it may be helpful to search a pool of data much
253. et of intensity measurements An array generally has one sample If all of the interesting genes fit onto one array the terms array chip and sample can be considered synonymous Array Layout A synthetic picture of genes on arrays The Array Layout view can be used to check for gross slide related problems Chip The measurements from a glass slide containing DNA samples for microarray analysis Classification A grouping of genes by k means or SOM clustering that is stored in the Classifications folder Cluster A collection of genes that have been grouped according to a certain criteria such as similar mean expression values Colorbar The rectangle on the far right of the main GeneSpring screen The intensity of the colorbar in GeneSpring indicates the reliability of the data for each gene Indicate a raw signal strength value to be considered very reliable a high signal strength value an average a medium signal strength value and an unreliable a low signal strength value Any gene with a signal strength control above the value indicated as a high signal strength will be colored using the brightest color appropriate any gene with a signal strength below the value given for unreliable data will be almost black in color The medium signal value gives the value for the mid point of the color bar and genes with a medium signal strength are colored halfway between the two color extremes Condition A grouping of one or more sam
254. etation and condition used for coloring For experiments with continuous numeric parameters the condi tion may actually be an interpolation between two measured conditions In Scatter Plot and 3D Scatter Plot views the parameter value is also displayed since it affects where the genes are graphed Venn Diagram displays Venn Diagram Color by Parameter lIf the experiment has parameters designated as color codes displays the name of the experiment interpretation and parameter s used for coloring In Scatter Plot and 3D Scatter Plot views the parameter value is also displayed since it affects where the genes are graphed Viewing Data 4 29 Display Options Color by Classification The name of the Classification or Gene List Folder used for coloring Equation for Line of Best Fit Scatter Plot view only If Line of Best Fit is selected displays the equation for the line of best fit data dependent Error Bar Information Displays whether the error bar is based on Standard Error Standard Deviation or the mini mum maximum data values and whether the error deviation information is based on within sample information or between sample information This is available in the follow ing views Graph Graph by Genes Bar Graph Scatter Plot and 3D Scatter Plot Gene List and Information on Selected Genes Displays the name of the selected gene list the number of genes in the gene list and the name of the select
255. eter values can also be assigned in different ways Cancer Malignant Benign or Normal Cancer Cancerous or Normal Cancer Tumor Metastatic or Normal If all of these parameter values are used they should be assigned to multiple parameters such as Cancer type Cancer Presence and Cancer Stage Additional parameters could be Age Gender and Treatment Parameter Display Options GeneSpring offers four ways of visually displaying a parameter a continuous element a non continuous element a replicate or hidden element or a color code When you create a new experiment your chosen display option becomes the default for that parameter If you simply paste in a new experiment all parameters are assigned in the Paste Experiment format Regardless of how a parameter was created in GeneSpring you can use the Experiment Change Experiment Interpretation command to change how it is dis played For more details on this see Experiment Interpretations on page 3 39 Working With Experiments 3 30 Experiment Parameters Hidden Elements Parameters defined as replicates are averaged together and appear as a single parameter A parameter defined as a replicate is graphically a hidden variable Defining a parameter as a replicate is the easiest way to deal with repeated samples inside GeneSpring The equation used for averaging repeated samples is exactly the same one used to average repeated measurements in a raw data file See
256. ethod Filter text Use as wildcard Must appear in of samples Filter columns specification Filters on a data file using an experi ment interpretation Outputs a gene list Filter on Error Condition Input One condition Error type Min imum Maxi mum Filters on errors using a condition Outputs a gene list Filter on Error Interpretation One interpreta tion Error type Min imum Maxi Filters on errors using an experiment interpretation Outputs a gene list of conditions Input mum Minimum of conditions Filter on One condition Data type Min Outputs a gene list containing the Expression input imum Maxi genes that have a measurement rela Level mum Minimum tive to a cutoff group Fold difference Must appear in Filter on Flags One or more Flag value Filters on flags and outputs a gene samples Minimum of list samples Filter on Fold One condition Data type Change one condition Comparison List Numbers In Range Lower bound Filter on Gene Gene list None Outputs the input gene list List Filter on Gene Gene list Cutoff Com Produces a gene list from an existing List Numbers parison gene list containing the genes whose associated number meets the speci fied criteria Filter on Gene Gene list Upper bound Produces a gene list from an existing gene list containing genes whose associated number meets the speci fied c
257. expectto see that sequence upstream of 4 568 ofthe genes The probability that this particular sequence is that common due to chance is 2 176e 41 However since 262 144 tests were done the false positive probability is really 5 704e 36 suggestion A A A A A A A A A Distance Sequence 500 AAAAAAAAAAAAAAAAAAAAAAAGAAAA 500 ATAAAATTACAAAAAAAAAAAGTCAGCG 500 ATGTAATAGGAAAAAAAAATGAAAAGAT 500 TTACTCGGTAAAAAAAAAACTGTTATAA 499 AAAAAAAAAAAAAAAAAAAAAAGAAAAA 499 TACTCGGTAAAAAAAAAACTGTTATAAA 499 TAAAATTACAAAAAAAAAAAGTCAGCGT 499 TGTAATAGGAAAAAAAAATGAAAAGATT Cancel Help Figure 6 14 The Conjectured Regulatory Sequence window Two menus File and List are located in the upper left corner of the window The File menu contains the following commands e Print Prints the list in the lower half of the Conjectured Regulatory Sequence win dow e Close Closes the Conjectured Regulatory Sequence window The List menu contains the following commands Remove Item Removes the highlighted item and its associated sequence motif from the list matching the common sequence motif being examined Make Gene List Displays the new Gene List window When a gene list is produced based on the occurrence of a specified sequence in this example ACGCG in the yeast data there is a number associated with each gene corre sponding to distance of the first such sequence upstream of the ORF The numbering Analyzing Data 6 23 Working wi
258. exported as 01 Ratio Numbers Display 5 110 0 110 01 100 this is the lower cutoff 25 4 33 I3 5 2 1 5 x1 5 3 x3 5 x5 Note In Fold Change mode the values on the vertical axis are not the same as those used for subsequent analyses or those that are exported using the Copy Annotated Gene List function In fold change mode the values are stored as 1 N for values greater than 1 and 1 1 N for values less than 1 where N is the normalized signal These values can be thought of as representing the distance away from normality i e the Working With Experiments 3 42 Experiment Interpretations point that represents neither over expression nor under expression These values may not be particularly useful for most users Parameter Display Modes Continuous Element Applicable only to Graph view the Continuous Element mode shows parameter values existing on a continuum where each point is connected with a line GeneSpring automati cally orders numerical parameters from highest to lowest and non numerical parameters in alphabetical order See Parameter Display Options on page 3 30 for details Non Continuous Applicable only to Graph view Non continuous mode shows parameter values existing independently of one another where each value is represented as a discrete point Gene Spring automatically orders numerical parameters from highest to lowest and non numeri ca
259. ext box can contain no more than 32 000 text characters including carriage returns The list of negative control genes should be intersected with the regions if there are any Negative controls should be averaged within each region If any region does not have any negative controls an error message appears alerting you that the normalization cannot be performed Set measurements less than 0 01 to 0 01 This option sets any measurements less than a specified cutoff value to the cutoff value By default this value is 0 0 To enter another value click in the Cutofftext box and enter a new value This step can be applied before or after other normalizations 5 6 Normalizing Data Normalization Types Transform from log to linear values This option transforms logarithmic data into linear expression values This is required if your raw data are reported in log values since GeneSpring requires data to be linear To view your data on a logarithmic scale use the experiment interpretation For more infor mation on experiment interpretations see Experiment Interpretations on page 3 39 To apply this normalization specify the base of the original measurements by selecting the appropriate radio button The available options are Base 2 Natural Log e Base 10 e Other Enter a base value in the provided text box Dye Swap This option swaps your Control channel and Signal channel in order to do dye compari sons Dye swa
260. ferent ways When you load your experiment GeneSpring automatically creates a Default Interpreta tion and an All Samples interpretation The Default Interpretation is the first item listed under the experiment in the navigator It may be most convenient to set up your most fre quently used interpretation as your Default Interpretation You can rename the Default Interpretation but you cannot delete it The All Samples interpretation makes all parame ters non continuous so that each parameter is viewed and analyzed individually The All Samples interpretation cannot be changed renamed or deleted E Change Interpretation for Yeast cell cycle time series no 90 min Name Default Interpretation Vertical Axis Range LowerBound 0 01 Upper Bound 10 Analysis Mode Log of ratio ha Use Measurements Flagged All Measurements E 0 conditions excluded Exclude Conditions 0 01 Iv Use Cross Gene Error Model in This Interpretation 020 50 80 110 150 How to Display Parameters Parameter Name Continuous Non continuous Color Code Do Not Display time e Cc E E Experiment Properties Yeast cell cycle time series no 90 min Experiment Inspector Experiment Parameters Error Model Structure Save Save As New Help Figure 3 24 The Change Experiment Interpretation window Changing Experiment Interpretation 1 Select Experiments gt Experiment Interpretation The Experiment Interpretation window appears Y
261. file is to tell GeneSpring how to read the database as if it were a sim ple text file It pulls the data together and places it in columns recognized by GeneSpring Column names and sample name references are entered in the Experiment Wizard as nor mal 1 Using your file management software create a new folder in the data directory of GeneSpring called Databases 2 Create a file named dataloader xml Instructions for setting up this file are pro vided in the section below Configuration File Reference 3 Issue a SQL command to retrieve the parameters in all samples Use MicroSoftQuery in Excel to generate SQL commands a In Excel open the Tools menu b SelectGet External Data c Select New Database Query d Make sure you specify what to edit in MicroSoftQuery Configuration File Reference The following section contains a list of the tags used in the database configuration file dataloader xml The configuration file is in XML format and uses tags enclosed in angle brackets much like an HTML document In such a document an e ement consists of a tag enclosed in angle brackets and usually includes a closing tag For example the top level element of this configuration file is the External Database Configuration element This element consists of opening and closing tags i e lt ExternalDatabaseConfiguration gt lt ExternalDatabaseConfiguration gt An element s contents are tags or text nested between the current e
262. g E Gene Lists a PCA Yeast cell cycle time _Choose Gene List gt gt an genes 6330 genes Era PIR keywords Yeast cell cycle time series no 90 min Default Interpretation mode Simplified Gene Ontology I X all genes g all genomic elements H ACGCGT in all ORFs g like YMR199W CLN1 0 Number of Clusters 5 EH Experiments Number of Iterations 100 ei Yeast cell cycle time serie k means cene Tree Condition Tree Self Organizing Map a Clustering Similarity Measure Standard Correlation zi I Start from current classification Test 5 additional random starting clusters IV Discard genes with no data in half the starting conditions Animate display while clustering Defaults Computation Preferences Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 7 1 The Main Clustering Window with the K means tab selected Using the Clustering Window To perform any clustering operation use the following steps 1 Select a gene list from the navigator on the left side of the screen and click Choose Gene List 2 If no experiment is selected choose one from the mini navigator and click Choose Experiment Click Add Remove to add or remove experiments from the list to be analyzed For more information on adding and removing experiments see Add Remove Experiments on page 7 3 3 Click the tab for your desired clusteri
263. g s per chip normalization scales the data of each chip to a user defined target intensity However GeneSpring s per chip normalization option Use Distribution of All Genes divides each intensity value by the median of all of values on the chip The resulting expression levels on each chip are centered around 1 For pre scaled Affymetrix data we recommend applying per chip normalizations using the distribution of all genes In addition we recommend applying a per gene normalization using the median of each gene The greatest benefit of performing these normalizations is that each gene intensity is centered artificially around 1 Several GeneSpring functions depend on scaling around 1 especially the cross gene error model Normalization of Two color Microarray Data Like most other technologies two color experiment data should be normalized at the gene level to standardize expression levels between genes and at the chip level to stan dardize expression values between arrays Two color experiments are designed to provide an internal standard at the spot level This per spot normalization often provides the same scaling that would be provided by a per gene normalization and thus per gene normaliza tion is often unnecessary Per Chip normalizations are useful in two color experiments to standardize the global intensities across multiple arrays Even after applying a per spot normalization global variability between chips often remains
264. g 41 4 Vert mag 16 8 Figure 4 34 Compare Genes to Genes HC Bookmarks C Scripts In the Compare Genes to Genes view GeneSpring employs a Pearson correlation to mea sure the pair wise similarities see Pearson Correlation on page B 3 Note that if you place the same list on both axes a line of perfect correlation values descends diagonally across the grid There are no display options in this view Viewing Compare Genes to Genes 1 Click the first gene list to compare in the navigator Do this before you switch the view type as large gene lists take a very long time to compare 2 Selectthe View Compare Genes to Genes option The default display places the selected gene list on both axes 3 If desired select a second gene list from the navigator by right clicking on a gene list and selecting the Display as Second List option To remove this second list select the View gt Remove Secondary Gene List 4 64 Viewing Data Graph by Genes View Graph by Genes View The Graph by Genes view allows you to visualize an experiment as one line where each point on the line represents the relative expression of one gene Select View gt Graph by Genes Note Genes with no data cannot be displayed in this view Full GeneSpring Yeast Genes zinc finger File Edit View Experiments Filtering Tools Annotations Window Help L 8 tryptophan Boon Gl 10 two component reg 1 3 tyrosine biosynthes X tyros
265. g Basics 4 GeneSpring asks if there are more files to be loaded for this experiment If there are additional files select them from the menu and click Add When you are done click Next If there are no more files to load click Next 5 Enter required attributes if any and click Next 6 Click Yes if you would like to create an experiment from the sample s you imported If not click No 7 Enter an experiment name in the Choose Experiment Name window and click Save Step 3 Assigning Normalizations Parameter Values and Interpreta tions 1 Select Experiments gt Experiment Normalizations Choose the types of normalizations to apply Four classes of normalizations are available background subtraction per spot normalizations e per chip global normalizations per gene normalizations Specify the desired normalizations and save For information about normalizations and when to apply them see Chapter 5 Normalizing Data 2 Select Experiments gt Experiment Parameters Set parameter name units values and value order and add any missing parameters For information about chang ing experiment parameters see Experiment Parameters on page 3 29 3 Select Experiments Experiment Interpretation Choose the follow ing mode of display lower and upper bounds of data flagged measurements to be included whether to use the Cross gene Error Model whether the data should be continuous non continuous
266. g and Characterizing Data Principal Components Analysis PCA on Genes Results When the analysis is complete the PCA Results window appears displaying each compo nent as a line in graph mode The significance of each component is represented by the color of its graph line as defined by the colorbar In addition a new gene list folder appears in the GeneSpring navigator with a name that includes the experiment that you used for PCA analysis e g PCA yeast cell cycle Principal Components Analysis Genes Results BHAR Principal Components Analysis on Genes Experiment Yeast cell cycle time series no 90 min Default Interpretation mode Log Gene List all genomic elements component 29 99 component15 08 component9 446 i Split Window component 8 043 i Show Bar Graph component 5 88996 i component 4 933 Change Colors component 4 02196 component 3 54796 component 3 325 j a Save Scores To save gene lists whose associated values are the component scores Sae Scores for each gene click Save Scores Save Profiles To save the shape of each principal component as an expression profile click Save Profiles ERDAS Close Help Figure 7 11 The Principal Components Analysis Results window Double click a component to view the Gene Inspector window which shows the eigen value and explained variability in the upper left panel This screen contains the following buttons Split Unsplit Window
267. g group 2 to group 4 p 3 For the non parametric version replace k with p Analyzing Data 6 40 Statistical Analysis ANOVA Viewing Post Hoc Test Results After 1 way ANOVA has been performed the 1 Way Post Hoc Testing Results window appears Summary by Gene The Results Summary by Gene tab displays the mean expression level by group for each significant gene For each gene the coloring indicates which groups differ significantly from the others Groups of the same color show no significant difference Groups of dif ferent colors differ significantly from each other Note Occasionally genes will pass the ANOVA cutoff but show no significant difference across the groups all groups the same color This is due to the fact that ANOVA is a more powerful test than the post hoc tests 5 1 way ANOVA Post Hoc Testing Results Gene List all genes Experiment T cell Treatment Experiment Parameter Treatment type Test Type Parametric test don t assume variances equal P value Cutoff 0 05 Multiple Testing Correction Benjamini and Hochberg False Discovery Rate Post Hoc Test Tukey Results Summary by Gene Results Summary by Groups The table shows the mean expression value for each gene in each group Groups with the same color show no statistical difference for that gene To make a gene list of all the genes that show a similar color pattern select the row of interest and click on Make List EDS
268. g may automatically do what is known as the affine background correction If a large percentage of your data are negative normalization can be a problem For instance the median which GeneSpring divides your data by in Use Distribution of All Genes can be very small or even negative In such cases GeneSpring readjusts the background level for your data by adding a con stant to all raw control strengths such that the 10th percentile is set equal to 0 The affine background correction is applied only when the 10th percentile is more negative than the median of the data are positive If the correction is applied a warning message appears during data loading Also in the Gene Inspector control strengths adjusted by this correc tion are flagged with asterisks You can choose to apply additional background correction in this step To apply back ground correction check the appropriate box in the Background Correction section of the screen You have the following options Never apply extra background correction Always apply extra background correction Prior to taking the specified percentile the bottom tenth percentile is used as a background correction and subtracted from all genes If needed apply extra background correction For samples in which the bottom tenth percentile is less than the negative of the specified percentile the tenth percentile is used as a background correction and subtracted from all genes before the specifie
269. g to line to apply Data Transformation Dye swap Per Spot Divide by control channel Buttons to edit delete re Per Spot Reserve control channel order normalization steps j Per Spot and Per Chip Intensity dependent Los Per Chip Normalize to a median or percentile zi Per Chip Normalize to positive control genes I Use Recommended Order Get Text Description Per Chip Normalize to a constant value Per Gene Normalize to specific samples __Use a Saved Scenario SESE Es ___SaveAs Scenario Soca Per Gene Normalize to median Warnings ae List of available Per Chip and Per Gene Median polishing edie normalization has been applied to any normalization z E steps OK Cance Hemp Errors and warnings Figure 5 1 The Experiment Normalizations window The Experiment Normalizations window lists the normalizations currently being applied to your experiment and allows you to add edit delete or re order normalization steps You can save the current normalization steps as a scenario for future use or load a previ ously saved scenario The Warnings panel displays information about unmet requirements or other potential problems with the currently specified normalizations Adding a Normalization Step To add a normalization step select the desired normalization from the list on the left side of the screen and click Add Normalization Step For detailed information on the available normalization types see the following sections Per Spot Normal
270. gator object is legal in the Make Name field The only illegal characters are the slash and backslash Note There is an 80 character limit for navigator object names If the dynamically gener ated name of the script output is too long for GeneSpring you will be prompted to change the name before you can save it This may be problematic in cases where a script outputs multiple gene lists or other objects Saving Scripts You can save either finished or unfinished scripts Unfinished scripts are displayed using a different icon in the Navigator than finished scripts To save a script select File gt Save The first time you save a script a Save As window appears in which you can specify a name for your script and a folder in which to save it If an error message appears saying your result cannot be saved rename the script and try again Folders By default the ScriptEditor saves scripts in the scripts folder You can create a subfolder within this folder by right clicking the parent folder in the Navigator and selecting Add Folder If you enter a nonexistent folder name in the Save As window the ScriptEditor creates the directory for you and saves your script in it Moving Scripts To move a script click it in the navigator and drag it to the desired location or right click it and select the Move command Warning Messages If GeneSpring detects any problems or missing information in your script warning or error messages m
271. ge lt ExperimentWorkedDesignation gt P lt ExperimentWorkedDesigna tion gt Notes Optional see lt ExperimentWorkedColumn gt lt ExperimentAbsentDesignation gt Specifies the flag in the column specified by lt ExperimentWorkedColumn gt that indi cates that the measurement did not work well Contents plain text Attributes n a Usage lt ExperimentAbsentDesignation gt A lt ExperimentAbsentDesigna tion gt Notes Optional see lt ExperimentWorkedColumn gt A 20 Installing from a Database Connecting your Database to GeneSpring lt ExperimentMarginalDesignation gt Specifies the flag in the column specified by lt ExperimentWorkedColumn gt that indi cates that the measurement worked only marginally Contents plain text Attributes n a Usage lt ExperimentMarginalDesignation gt M lt ExperimentMarginalDesignation Notes Optional see lt ExperimentWorkedColumn gt lt RegionColumn gt Specifies the column that indicates regions to be normalized separately Contents plain text Attributes n a Usage RegionColumn l RegionColumn Notes Optional rarely used TreatNoSignalAsInvalid Specifies whether a signal of 0 should be treated as blank If no value is specified for this tag it defaults to no Accepted values are no and yes Contents plain text Attributes n a Usage lt TreatNoSignalAsInvalid gt no lt TreatNoSignalAsInva
272. gene list The resulting text file can be opened in any program that accepts tab delimited text such as spreadsheet and word processing programs Annotation Options Your options for copying and saving information with an annotated gene list are listed in the Copy Annotated Gene List window Descriptions of these items can be found by click ing Help The type and amount of information listed varies depending on your genome and the way that genome was loaded into GeneSpring The Systematic Name is always saved in the first column of a gene list General Gene List Numbers The values if any that GeneSpring has associated with this gene list This column appears only if you have associated values See Adding an Associated Number Restriction on page 10 for details on the types of numbers GeneSpring attaches to gene lists Gene List Note Any notes attached to a gene list Identifiers Common Name A non systematic way of referring to a gene Synonyms Other names entered for your gene list GenBank A gene s GenBank Accession Number if known 9 6 Exporting GeneSpring Data Exporting Gene Lists Normalized Data Average The mean of any normalized replicates in the experiment Minimum The minimum normalized signal values for each gene Maximum The maximum normalized signal values for each gene Flags Flags associated with each gene Standard Error The standard error of the normalized values for each gene
273. gene list of all genes in a selected experiment that are 2 fold overexpressed or 2 fold underexpressed in at least one condition and then creates a gene tree an condition tree a k means classification and a self organizing map Filter on Noise Creates a list of genes that have control strengths equal to or greater than a user supplied cutoff in at least half of the conditions in the experiment Find List of Similar Genes This script makes a gene list for each of the genes in a selected experiment if there are at least five genes with similar expression profiles Pairwise Comparison Returns a list of genes that are two fold overexpressed in at least one condition in an experiment at compared with a specified experiment Probe Entire Enterprise Repository for Similar Conditions PEER C Given a condition this script searches through GeNet for conditions similar to the input condi tion If the same sample is normalized differently in different experiments both nor malizations are compared Select k means Given an experiment and a gene list this script creates two k means classifications with the numbers of clusters specified by the use and chooses the k means cluster with the highest explained variability as the result Send Clustering Results to GeNet This script creates a gene tree condition tree k means classification and self organizing map using a list of all genes in an experiment that are 2 fold overexpressed or 2 fo
274. genome or housekeeping genes which are used to control for differences in the amount of exposure between sam ples The formula for this difference is signal strength of gene A in sample X median signal of the positive controls in sample X To normalize to positive control genes first enter a list of genes This gene list can be typed in or loaded from a file To type in a gene list simply enter a gene in each line ofthe text box provided Right click in this box to use the Copy and Paste options To load a gene list from a file click Load From File and select the gene list from the browse window that appears If there are already genes listed when you click Load From File the genes from the list you select are added to the existing list of genes Any genes you have already entered are not overwritten Note This text box can contain no more than 32 000 text characters including carriage returns Select the percentile of the positive controls by which to divide each sample By default this value is 50 0 You can limit measurements based on a specified cutoff or by flag values If measurements are limited by flag values the percentile is calculated using only the genes that pass the flag restriction To limit measurements by flag values check the Use only measurements flagged box and select the appropriate option from the pull down menu The available options are Present Only Present or Marginal Anything but
275. gs and how to load them into your experiment see Using the Column Editor on page 3 9 Negative Control Strengths Some types of microarray technology report negative control strengths This is usually the result of subtracting estimated background levels that are larger than the raw signal This can happen in situations where the expression levels of the gene are low compared to the measurement error It can also happen when there is background subtraction or when a mismatched probe set has higher intensity levels than the perfect match probe sets If negative signal levels occur in a large fraction of the data used for normalization there can be problems with the normalization as the median across the normalization set can be very small or even negative This leads to unreasonable results of normalization In such cases which only occur in a few situations GeneSpring does an extra step in the normal ization where it readjusts the background level for that data by adding a constant to all the raw control strengths in such a way that the 10th percentile of the signal is set equal to 0 before proceeding with the median normalization This correction called the affine back ground correction is applied only when the 10th percentile of the data is more negative than the median of the data is positive A warning message appears when you first load your data into GeneSpring if this background correction has been applied Whether or not the above
276. h Atlas 2 0 and Incyte GEM Tools 2 4 have a GenBank Accession number column that is loaded into the master table of genes If you have difficulties creating a genome in this way use the New Genome Installation Wizard See The New Genome Installation Wizard on page 2 2 Creating a New Genome 1 Select File gt Import Data 2 Choose the data file to load 3 Specify the file format For details see Importing Experiment Data on page 3 2 4 Select Create a New Genome and enter a name in the Choose a Name field You have the option to load additional files to the experiment Choose the files to load GeneSpring gives you the option of adding any new genes to the genome Ambiguous Gene Identifiers When the gene identifier specified during data import is not unique to a single gene in the genome GeneSpring can not determine which gene the measurement is for In this case the identifier and all corresponding genes to are listed in the Ambiguous Gene Identifiers table and the measurement is not loaded To prevent this problem edit the raw data file s to assign a gene identifier to each gene that is unique in the genome the systematic gene name in the genome is always unique and then re import your data Contact Silicon Genetics Technical Support at 1 866 SIG SOFT if you have further ques tions or experience difficulties Creating Genomes 2 9 Data Format Data Format This section describes the format of the files cre
277. h the most significant differential expression smallest p value are returned Filtering genes based on a one sample t test of the mean expression level across repeats or replicates versus a reference value can be done by selecting t test p value as the filter criteria in Expression Percentage Restriction Statistical Analysis ANOVA DER E Gene Lists EC PCA Yeast cell cycle time EELS DEES BED HO PIR keywords i Yeast cell cycle time series no 90 min Default Interpretation Mode Log of HH Simplified Gene Ontology Active X all genes all genomic elements 1 Way Tests 2 Way Tests IHE ACGCGT in all ORFs You can expect a false discovery rate of about 5 of the genes identified Z like YMEST8dvV CENT 0 Parameter to Test time minutes v Select Groups Manually Yeast cell cycle time serie H Experiments ei g Test Type Parametric test dont assume variances equal v P value Cutoff 0 05 Multiple Testing Correction Benjamini and Hochberg False Discovery Rate v Post Hoc Tests None x Computation Preferences Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 1 1 The Statistical Analysis ANOVA window To perform ANOVA analysis 1 Select Tools gt Statistical Analysis ANOVA 2 Select a gene list from the navigator and click Choose Gene List 3 Select an exper
278. hat ends in specific characters There is no suffix to remove C There is a fixed suffix to be removed C There is a suffix that begins with specific characters Two Color Data Files The signal and control columns for each chip are in different data files Default Normalizations Choose a normalization scenario to automatically apply to this file format Select a Normalization Scenario v ok Cancet Help Figure 3 10 Advanced Column Editor Options a To strip a gene identifier prefix or suffix in the Gene Identifier Prefix and Suf fix Removal section select the appropriate radio button and enter the characters to be stripped in the text box next to your choice b To specify that your signal and control values are in separate files check the box in the Two Color Data Files section c To apply a default normalization scenario to your experiment files select the appropriate scenario from the pull down menu in the Default Normalizations sec tion For more information on the available default normalizations see Default Normalizations on page 3 21 4 To save this file format setup for future use click Remember This Format The format is added to the cache of recognized formats so that GeneSpring recognizes it When prompted enter a name for the new format Default Column Assignments of Known Products GeneSpring recognizes column titles of various commercially available products and place them
279. he Finish button to save your genome Gene HyperText Links The format for gene URLs in the genomedef file has changed from earlier ver sions of GeneSpring The new format is as follows GeneHypertextLinks link http www example com amp gene lt field1 gt amp id lt field2 gt where J ink is the name of the link and must be followed by a colon not a semicolon Instances of ield are replaced by the value of the specified parameter The allowed parameters are systematic common Creating Genomes 2 5 The New Genome Installation Wizard genbank ec e pubmed map chromosome synonyms description phenotype function product keywords dbid customl custom2 custom3 Links can be created using any column Labeled format allows an unlimited number of columns A link is enabled for a particular gene only if all parameters mentioned in that URL are defined for that gene Experiment URLs work exactly the same way except that they begin with Experimen tHypertextLinks instead of GeneHypertextLinks and the field variables contain names of parameters A link is shown in the experiment inspector only if the experiment has parameters with names matching all fields in the URL In both cases the parameter names are not case sensitive Thus if an experiment has a parameter called Time you can specify it as time Time or TIME in the URL In earlier versions of G
280. he left of its name and enter a new weight The equation used to determine the overall correlation is X Aa Bb Cc at b cet The correlation coefficient between the gene in question in experiment 1 and the selected gene also from experiment 1 a The weight specified for experiment 1 The correlation coefficient of the gene in question in experiment 2 to the selected gene also from experiment 2 b b is the weight associated with experiment 2 The correlation coefficient of the gene in question in experiment 3 to the selected gene also from experiment 3 c The weight associated with experiment 3 and so on Experiments 1 2 3 and so forth are all of the experiments selected in the white Correla tions table If X is between the minimum and maximum correlations specified in the Find Similar Genes window the gene in question passes the correlations Standard Correlation Measures the angular separation of expression vectors for Genes A and B around zero Result a b a b Smooth Correlation Make a new vector A from a by interpolating the average of each consecutive pair of elements of a Insert his new value between the old values Do this for each pair of elements that would be con nected by a line in the graph screen Do the same to make a vector B from b Result A B A BU Change Correlation Make a new vector A from a by looking at
281. he left panel of GeneSpring windows containing data organized into folders Normalize The use of statistical methods to eliminate systematic variation in microarray experiments that can influence measured gene expression levels Panel Section of a window or screen Pathways A pathway is a graphical representation of the interaction between gene products in a biological system Genes can be superimposed on the pathway allowing you to view their expression levels in a biological context Parameter Value One of the possible values assigned to a variable For example in the equation X 1 2 3 or 4 X is the experimental parameter and the numbers 1 2 3 or 4 are each a different parameter value of X A more pertinent example is the parameter values breast cancer kidney cancer liver cancer brain cancer and no cancer could all be different parameter values for the experimental parameter cancer Parameters Color Code is similar to a discrete parameter except you would expect points on a graph with Glossary 2 the same parameters other than this one to be at the same horizontal position Colors would then be typically used to distinguish these points Typical examples are the same as for non continuous parameters This may be referred to as category Continuous Parameter is a numerical parameter for which interpolation makes sense Graphs using this parameter are line graphs If there are no continuous
282. he tops of the lines selecting this options displays the numerical value associated with each line Color by all conditions Divides the genes into sections representing multiple condi tions so that all conditions in the selected interpretation can be viewed simultaneously Using this feature disables the condition slider at the bottom of the genome browser 4 58 Viewing Data Ordered List View Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser The Coloring panel allows you to modify the way color is used to represent different types of data For more information see Color on page 4 30 Viewing Data 4 59 Array Layout View Array Layout View The Array Layout view produces a synthetic picture of the arrays used in the current experiment This view is useful in identifying arrays that display local shifts in intensity due to problems in probe deposition hybridization washing or blocking To use this view you must first create an array layout file see Layout Parameters on page 2 12 e Full GeneSpring Yeast Genes like YMR199W CLN1 0 95 File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists EC EH PCA Yeast cell cycle time amp HCJ PIR keywords Simplified Gene Ontology al all genes X all genomic elements 8 ACGCCT in a
283. her a particular keyword appears in any of the specified fields This is useful in cases where a given string such as cancer is some times the parameter sometimes the parameter value and sometimes is part of the experi ment name 5 Sample Manager Filter on Attributes Filter Results 4 Samples i o Author Creation Date Research Group Filter on Parameter Show All Filter on Experiment Piara sre E yeasttimeseries t Itizy 1037 Type Your Name Jun 19 2001 10 2Silicon Genetics 111 Search For veasttimeseries t itizj 1036 Type Your Name Jun 19 2001 10 2Silicon Genetics 10t m yeasttimeseries t Itizy 1035 Type Your Name Jun 19 2001 10 2Silicon Genetics 80 Search In IV Sample Name Iv Notes IV Sample Attributes l gt Iv Parameter Values v Experiment Parameters Add Add All Reme Remove All spe Configure Columns Selected Samples 1 Samples Onions o author eration Date esearch Group in Li veasttimeseries t ttizy 1028 Type Your Name Jun 19 2001 10 28ilicon Genetics 10 I Whole Words Only W Use as a Wildcard gt 4 mm Reset 1 is Publish to GeNet eNe Delete Create Experiment Edit Attributes Use samples stored Locally M Figure 3 18 The Filter by Keyword tab To filter by keyword 1 Enter the desired keyword in the Search For box This keyword can be a word ora number It can also contain an asterisk as a
284. hether to run the script locally or on a Remote Execution Server by selecting the appropriate radio button If the desired server is not already in the list select Add New GeNet and enter the requested information in the Edit GeNet Server dialog that appears An estimate of the time required to run the script also appears in this part of the screen These are loose estimates which are described as follows Seconds e Less than a minute A few minutes Less than an hour Afew hours Many hours Click Start This button is not active until all required data has been entered If your script will upload data to a regulatory compliant GeNet you will be prompted to provide an electronic signature For more information see Electronic Signatures on page 9 14 When the script has finished running the Results window appears The appearance of the Results window varies depending on which type of script you run If the script returns a data object name your results and click Save You can also add and save information in the Notes area If you do not want to keep the results of your script click Cancel Specifying Parameters for Data File Restriction Some of the sample scripts included with GeneSpring require you to enter parameters for data file restriction Which data file format to search and which columns in that data file format that are to be searched is entered as a special text string in the Filter
285. hnology The denominator used to normalize each measurement is referred to as the control strength What is Interpreted Data GeneSpring can interpret normalized data in many different ways You can elect to have multiple samples treated as replicates and averaged and indicate what assumptions you GeneSpring should make about the precision of these averaged values You can display and perform analyses on normalized data using three modes ratio raw versus control strength logarithm of ratio or in terms of fold change versus the control strength 7t is important to note that the graphical display of normalized values and the numbers used for all analyses such as clustering reflect the mode you have chosen However the num Welcome to GeneSpring 1 7 GeneSpring Basics bers displayed as text as in the Gene Inspector window and entered by the user as parameters for analyses as in the Filter Genes tools are always in ratio mode What are Flags Flags are additional measurement markers in your data set They can be assigned as present marginal unknown or absent Data Loading The demonstration version of GeneSpring comes pre loaded with sample yeast rat and Affymetrix data Many users benefit from performing trial analyses on these sample data sets When you are ready to analyze your own data you must load and set up the data for analysis There are four steps to preparing data 1 Loading gene information optional 2 Lo
286. hoose Experiment gt Yeast cell cycle time series no 90 min Default Interpretation Cross Gene Error Modelis Active Choose Error Type Standard Deviation fad Filter Genes on Standard Deviation 1 117 out of 117 genes pass filter v Interactive Update Standard J Minimum Values must appear in atleast 8 out of 16 conditions Save Close Help Preview pane Figure 6 9 A Typical Filtering Window Preview Pane Options When you open a filtering window the default view in the preview pane is based on what type of view makes the most sense for that filtering type To link the preview display to the main GeneSpring window select Main Window from the View menu The preview pane updates dynamically as you change settings for the filter When work ing with large experiments this may cause GeneSpring to respond slowly To disable this feature uncheck the Interactive Update box Using the Double Ended Sliders In filters that require you to set a minimum maximum range a double ended slider appears You can set a range either by using the sliders or by entering numbers directly in the Minimum and Maximum boxes Analyzing Data 6 52 The Filtering Menu Standard Deviation 0 1 Figure 6 10 A Double Ended Slider You can preserve the size of the specified range while changing the settings by clicking the blue bar between the sliders and dragging it to the desired position In some filters the ti
287. hsp70 type molecular chaperone YLR370C Arp2V3 Complex Subunit Similar lists Show as List o Show as Navigator List Name PCA component 1 PCA component 2 PCA component 3 PCA component 4 PCA component 5 PCA component B PCA component PCA component 8 Save Cancel Figure 6 8 The New Gene List window Enter a name and destination folder for the new gene list or accept the default and click Save To specify the location for the destination folder select the desired parent folder from the navigator Making Lists from Properties You can make gene lists based on the properties annotations contained in your master gene table These lists are not ordered To make a list from properties 1 Select Annotations gt Make Gene List from Properties pre 4 1 users select Tools gt Make Gene List from Properties Choose a property on which to base your list from the pull down menu Uncheck the Divide by semicolons box if you do not want your data separated by semi colons You can specify to include a list only if it has a certain number of members or you can include all lists By default GeneSpring removes gene lists with one or fewer members Change this number in the text box provided or include everything by unchecking the Remove classifications with or fewer box Enter a name for your gene list folder Analyzing Data 6 12 Working with Gene Lists 6 Click OK A new folder with the gen
288. i ment parameters from this screen For information on how to edit experiment parameters see Experiment Parameters on page 3 29 To add a new sample attribute click New Attribute The Sample Attribute screen appears 4 14 Viewing Data Inspectors Sample Attribute Sample Attribute Description Attribute Nam is d ge i Array Design Author Common Reference Yes No ID Concentration Data processing normalization Developmental Stage C Custom Attribute Valu C R Custom Attribute Units C Custom Attribute is numeric Figure 4 9 The Sample Attribute window You can specify the following on this screen Attribute Name The name of the attribute being added You can select a standard attribute name from the scrolling list or select the Custom radio button and enter a new attribute name Attribute Value The value of the attribute being added For many standard attributes there is a default value You can accept the default or select the Custom radio button and enter a new value Attribute Units The units in which the attribute is measured For many standard attributes there is a default unit You can accept the default or select the Custom radio button and enter a new unit If the unit is numeric check the Attribute is numeric box When you are done click OK to save your new attribute and return to the Sample Inspec tor To remove an attribute sele
289. i list as the Universe Output a number verse representing the probability that the intersection between the two lists could be due to chance Note that there is no multiple testing correction applied by this script Filtering Open Scripts gt Basic Scripts gt QC amp Analysis gt Filter Gene Tools Name Input Knobs Description Filter on Anno Gene list The string to be Outputs a list of genes whose annota tations searched tions contain a specified text string Filter on Confi One condition Measure of Filters on t test p value or number of dence Condi confidence replicates using a condition Outputs tion Input Multiple test a gene list ing correction Minimum Max imum Filter on Confi dence Interpre tation Input One experiment interpretation Measure of confidence Multiple test ing correction Minimum Max imum Mini mum of conditions Filters on t test p value or number of replicates using an experiment inter pretation Outputs a gene list Filter on Data File Condition Input One condition Filter method Filter text Use as wildcard Must appear in of samples Filter columns specification Filters on a data file using a condition Outputs a gene list Scripts and External Programs 8 29 Script Building Blocks Name Input Knobs Description Filter on Data File Interpreta tion Input One experiment interpretation Filter m
290. i e in comparison to zero However instead of using the expression values in each experimental point to create the expression vector for gene A it is based on an arctangent transformation of the ratio between adjacent pairs of experimental points It uses these to create the expression vector B 6 Equations for Correlations and other Similarity Measures Common Correlations This correlation looks for instances where gene A and gene B are changing at the same time Using the arctangent makes a measure of change that is less sensitive to outliers than using the ratio directly To compute a Change correlation 1 Make a new vector A from a by looking at the change between each pair of elements of a 2 Do this for each pair of elements that would be connected by a line in the graph screen The value created between two values a and a is atan aj a 1 4 3 Do the same to make a vector B from b Change correlation A B A B Upregulated Correlation The Upregulated correlation is very similar to the Change correlation except that it only considers positive changes All negative values for the arc tangent transform of the ratio are set to zero This emphasizes only periods when new RNA is being synthesized To compute an Upregulated correlation 1 Make a new vector A from a by looking at the change between each pair of elements of a 2 Do this for each pair of elements that would be connected by a line in the graph sc
291. iable no visual distinction is made based upon this parameter or its parameter values Regulatory Sequence The sequence upstream of a given gene to which regulatory enzymes bind determining the amount of expression of a particular gene Sample The measurements taken from one or more chips containing a single liquid sample or the data generated from a biological object placed onto an array or set of arrays Slider A horizontal scrollbar at the bottom of the GeneSpring window that changes the display of genes from one sub experiment to another e g ina time series experiment the slider moves the displayed genes across the different time periods t test T tests calculate p values which measure the significance of differential gene expression in each condition Trust A measure of reliability of the data Two Color Experiment An experiment where a control is used Variable A factor such as a disease drug concentration patient name pipette number time the strain of organism tested or who performed the experiment etc These variables allow you to look for meaningful patterns in you data and deal sensibly with replicate experiments Glossary 3 Glossary 4 Symbols layout file 2 12 examples 2 13 Numerics 3D scatter plot view 4 49 X Y and Z axes 4 50 4 69 A adding extra genes 2 7 advanced search 4 4 affine background correction 5 13 5 20 Affymetrix data normalizing 5 17 animation controls 4 71
292. ic regulatory sequence 1 Select Tools gt Find Potential Regulatory Sequences The Find Potential Regulatory Sequences window appears 2 Click the Enter a Specific Sequence tab at the top of the screen Find New Sequence Entera as Sequence Set Gene List PCA component 1 Number of Genes 6 127 6 127 have sequence Search Criteria Search before ORFs From 10 To 500 bases upstream of each gene Add Promoter Sequence Allow Ns in Regulatory Sequence From 0 To 0 single point discrepancies Statistics Relative to upstream of other genes C Relative to whole genome Probability Cutoff 0 05 M Do local nucleotide density correction Figure 6 13 The Enter a Specific Sequence tab 3 Select a gene list from the navigator and click Set Gene List Note Do not choose the all genes or all genomic elements gene lists You are already comparing your selected gene list against all other genes in the genome 4 Enter the number of bases upstream of each gene in the Search Before ORFs sec tion of the window For example if you enter From 10 To 100 on a search for ACGCGT GeneSpring searches for any part of the promoter within the region between 10 and 100 The smaller the range between these numbers the greater the likelihood that the results are statistically significant Larger sequences may take longer to search You can also search for common sequences
293. ick OK to return to the tree view Navigating Subtrees To navigate through subtrees right click on any node and select Display Sub tree To view the tree immediately above the one selected right click anywhere and choose Display Parent of Sub tree To return to the top and view the entire tree right click anywhere and select Display Entire Tree This returns you to the default GeneSpring tree view Double clicking a terminal branch a line indicating only one condition or gene invokes either the Condition Inspector or the Gene Inspector depending on the branch Keyboard commands for the right hand or top tree usually the Condition tree Alt left arrow Jump to the sibling to the left of the selected node Alttright arrow Jump to the sibling to the right of the selected node Alt up arrow Jump to the parent of the selected node 4 54 Viewing Data Tree View Alt down arrow Jump to the first child of the selected node counting from left to right Keyboard commands for the left hand tree usually the Gene Tree Ctrl left arrow Jump to the parent of the selected node e Ctrl right arrow Jump to the first child of the selected node counting from top to bottom e Ctrl up arrow Jump to the sibling directly above the selected node Ctrl down arrow Jump to the sibling directly below the selected node Magnifying Trees Magnification in the Tree View is not quite the same as in the other views due to the
294. ients p values fold change ratios or in the case of a regulatory sequence search the number of base pairs before the promoter region Associated numbers can be found by double clicking a gene list to bring up the Gene List Inspector Filtering genes by their associated numbers is helpful if you want to use this information to create a more specific list of genes For example you may want to find genes that are very similar to another gene with a high correlation coefficient or genes that are a spe cific distance from a promoter found using the Find Potential Regulatory Sequences tool For details on Find Potential Regulatory Sequences see Regulatory Sequences on page 6 18 Analyzing Data 6 64 Basic Filters 5 Filter on Gene List Numbers EHS Gene Lists HHC PCA Yeast cell cycle time zhoose Genel like YHR199W CLN1 0 95 HHE PIR keywords Numbers Represent Correlation Coefficient to YMR199W CLN1 H Simplified Gene Ontology all genes X all genomic elements IE ACGCCT in all ORFs z 3 M 0 99 0 985 z1 31 out of 117 genes pass filter v Interactive Update 0 995 0 98 0 975 0 97 0 965 0 96 0 955 2095 Minimum 0 964 Maximum 0 99 View OrderedList v Save Close Help Figure 6 21 The Filter on Gene List Numbers window 2 o 2 re 5 a E D eo To filter on gene list numbers 1 Select a gene list with associated numbers from the navigato
295. ilter is limited to the computer on which it was created It cannot be sent to another user Saved Scripts Filters saved as scripts save all of the current inputs as default inputs but those inputs are not required to run the script The script is saved in the Scripts folder in GeneSpring s nav igator It will not reconstruct the appearance of the Advanced Filtering window instead it runs exactly like a standard GeneSpring script For more information on scripts in Gene Spring see Scripts on page 8 2 When you save a filter as a script it is not limited to the computer on which it was created This means you can send it to other GeneSpring users Analyzing Data 6 68 T Clustering and Characterizing Data Clustering and Characterizing Data 7 1 The Clustering Window The Clustering Window GeneSpring s clustering algorithms are designed to divide genes or conditions into groups that have similar expression patterns GeneSpring supports a variety of clustering meth ods each designed to solve a distinct type of problem These are useful tools to identify genes that are potentially co regulated as well as to reveal coordinated responses to exper imental treatments To perform clustering operations select Tools gt Clustering and choose the appropriate clustering method from the menu The following options are available K means Gene Tree Condition Tree Self Organizing Map e QT Clustering Clusterin
296. iment from the navigator and click Choose Experiment 4 Click the tab for 1 Way Tests or 2 Way Tests 5 Specify the appropriate options for your analysis For detailed information on the options available on each tab see 1 Way ANOVA on page 6 34 and 2 Way ANOVA on page 6 43 6 Specify whether to run this operation locally or on a GeNet Remote Server Click Start N Analyzing Data 6 33 Statistical Analysis ANOVA 1 Way ANOVA Use 1 Way ANOVA to filter out genes that do not vary significantly across different groups with multiple samples This allows you to find those genes that exhibit important changes between various conditions of the experiment This comparison is performed for each gene and the genes with sufficiently small p values are returned Comparisons can be performed with parametric or non parametric methods The parametric comparison for two groups is known as Student s two sample t test For multiple groups this is known as one way analysis of variance ANOVA You can specify whether to assume within group variances are equal across all groups Calculations without the assumption of equality of variances are done using Welch s approximate t test and ANOVA Non parametric comparisons are also available corre sponding to the Wilcoxon two sample text also known as the Mann Whitney U test for two groups and the Kruskal Wallis test for multiple groups 1 Way ests 2 way Tests You can expect a f
297. in Y axis YALOO2W VPS8 ratio Colored by Gene YALOO2W VPS8 Z axis YALOO3W EFB1 ratio i Zoom Fully Out Display Options View 3D Scatter Plot v Close Help Figure 4 37 The Condition Scatter Plot view To view the condition scatter plot select View Condition Scatter Plotinthe main GeneSpring window Unlike most views the condition scatter plot is displayed in a separate window which appears when the option is selected This window also appears when you run a PCA on Conditions analysis For details see PCA on Conditions on page 7 18 For a 2D view of this plot select 2D Scatter Plot from the View menu in the lower right portion of the window In the example above each dot represents a condition When this window is opened from the main GeneSpring window the first three parameters if available are selected for the axes If only two parameters are available the plot is displayed in 2D format If there are fewer than two parameters a 3D plot is displayed using the first three genes from the selected gene list You cannot select the experiment to be displayed from within this view To change the experiment being viewed exit this window select the desired experiment and choose View gt Condition Scatter Plot in the main GeneSpring window Pressing the x y or z keys rotates the graph on the specified axis Hold down the Shift key to speed this rotation Hold down the Alt key to reverse the direction of rotatio
298. in group sum of squares i l G ad on 1 the denominator degrees of freedom id ifd is not greater than zero then exit p value 1 WMS WSS 5 the within groups mean square and 2 F BMS MS the F ratio statistic if WMS 0 then make F is treated as arbitrarily large p value 0 The p value is calculated by looking up F in the upper tail probability of an F distribution with d and d degrees of freedom Parametric Test Variances Not Assumed Equal For the parametric test without assuming variances equal Analyzing Data 6 36 Statistical Analysis ANOVA First check that each group has N greater than or equal to 2 and SS greater than 0 If not remove it from consideration and recompute G If G is not at least 2 exit p value D This reflects the more stringent requirements of not assuming the variances equal if the variance estimate is pooled replicates are only needed for at least one group if variances are separately estimated then replicates are needed for each group Then compute N 1 w nf the group weights G W Svi the sum of weights id G Sw A X y weighted mean 2 BSS v x X the between groups sum of squares dj G 1 the numerator degrees of freedom BMS 855 f the between groups mean square 1 Z S oy fts 1 G 1 W d the denominator degrees of freedom ifd is not greater than zero then exit p value 1 WMS 1 2 G 2 Z the within group mean squa
299. in the Gene List folder in the navigator 2 Select Edit gt Copy gt Copy Annotated Gene List A menu appears 3 Choose an experiment interpretation from the Copy based on interpreta tion pull down menu See Experiment Interpretations on page 3 39 for informa tion on experiment interpretations 4 Choose options on the Copy Annotated Gene List window by checking or unchecking the boxes 5 Click Copy to Clipboard 6 Paste the list into another application Saving Annotated Gene Lists 1 Select a gene list from the Gene List folder in the navigator Exporting GeneSpring Data 9 5 Exporting Gene Lists 2 Select Edit gt Copy gt Copy Annotated Gene List A menu appears Copy Annotated Gene List DAR Copy based on interpretation SISTI v Select information to copy Identifiers M Common Name IV Genbank Synonyms Normalized Data v Average I Standard Error Minimum Standard Deviation Maximum Hest p value Flags Natural Logarithm of Normalized Data Average Standard Error Minimum Standard Deviation a Copy to Clipboard Save to File Cancel Help Figure 9 1 The Copy Annotated Gene List window 3 Choose the experiment interpretation from the Copy based on interpreta tion pull down menu See Experiment Interpretations on page 3 39 for informa tion on experiment interpretations 4 Click Save to File 5 Choose a name and location to save your
300. in the Parameter values in selected interpretation alphabetic order Figure 4 18 An experiment colored by parameter You can also choose to color by parameter from the Display Options menu 1 Select View gt Display Options and click the Coloring tab 2 Select an experiment from the navigator on the left side of the Display Options menu 3 Click the Set Experiment button 4 Click OK Viewing Data 4 33 Display Options No Color This option allows you to view genes with no coloration showing all genes in gray To implement this option select Colorbar gt No Color You can also select a single color in which to display genes by selecting the Solid Color option from the pull down menu on the Coloring tab in the Display Options menu Color by Classification This coloring scheme allows you to color code the genes by some previously defined knowledge about them You can use a folder of lists to color by classification or a classifi cation method such as k means or SOM To color by a previously saved classification 1 Open the Classifications folder by clicking its icon 2 Select a classification by right clicking over the name 3 Select Use Coloring from the pop up menu and GeneSpring automatically updates to reflect the new coloring scheme The colorbar shows the names of the sets present in the chosen classification Full GeneSpring Rat Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotatio
301. indow Show Horizontal Axis Label Displays the parameter that is graphed on the horizon tal axis Show Vertical Axis Label Displays the parameter that is graphed on the vertical axis Label Vertical Axis on Side Displays the vertical axis label vertically If this is unchecked the vertical axis label sits to the right of the top of the vertical axis Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser 4 66 Viewing Data View as Spreadsheet View as Spreadsheet This option allows you to view your data as a spreadsheet The spreadsheet color scheme and gene list reflect what is showing in the genome browser at the time you activate the new window The order of the genes is the same as in your master table of genes e test Default Interpretation Show Control Show Raw Show p value Show Flags Copy All stage Postnatal day 0 stage Adult day 0 stage Postnatal day 7 stage Embryonic EN 8Ot day 13 s normalized normalized normalized normalized normalized 0 01 0 725 G67186 GAT1 S100beta 4 Mode log Value adjusted due to log interpretation Figure 4 36 Spreadsheet View To Copy a Row for Pasting into another Document 1 Click on the row to copy 2 Right click on the row and select Copy To copy the entire spreadsheet click Copy A11
302. ine specific ph X ubiquinone biosyntt X ubiquitination 3 ubiquitin binding x urea transport 8 yeast vacuole 3 zinc dan H Simplified Gene Ontolc H Biological Process H Cellular Componen C3 Molecular Function Normalized Intensity log scale Apoptosis Regu H Cancer Y axis Yeast cell cycle time series no 80 min Default Interpretatio L 8 oncogene H Gene List zinc finger 57 Lg Tumer SUDDI Sorted by time 0 minutes HCJ Cell Cycle Regu gt rH Chaperone RS SS a iii Pie me _ Show All Genes Zo Magnification 1 Figure 4 35 The Graph by Genes view limited to the Zinc Finger list Genes at the top of the selected gene list are displayed at the left end of the experiment line and genes at the bottom of the gene list are displayed at the right end of the experiment line Generally your gene lists are ordered so that the associated values appear in descend ing order If you do not have associated values your genes appears in the same order as in the master gene table Graph by Genes Display Options The following display options are available in graph by genes view Horizontal Axis The available options for this view are listed below Vertical Axis See The Vertical Axis on page 4 27 Features The available options for this view are listed below e Coloring Color on page 4
303. ing a minimum expres sion value to be met in at least one condition To filter on Expression Level 1 Select an experiment or condition from the navigator and click Set Experiment You can also select a subset of conditions within an experiment 2 Select the appropriate data type from the Choose Data Type menu For more informa tion on data types for filtering see Data Types for Restrictions on page 6 53 3 Click Exclude Conditions to specify which conditions if any to exclude from the analysis Analyzing Data 6 54 Basic Filters Zj 40 Check All Clear All OK Cancel Help Figure 6 12 The Exclude Conditions window By default all conditions are selected To exclude a condition uncheck the box to its left To include a condition check the box Click Check A11 to include all condi tions or Clear A11 to exclude all conditions 4 Specify the following values for the filter Minimum the smallest gene value to allow in your list also known as the cut off value Maximum the largest gene value to allow in your list Values must appear in at least out of conditions the number of conditions in the total experiment where genes must meet the specified requirements This line can refer to the whole experiment Filter on Fold Change Filter on Fold Change finds genes based on a comparison of two samples or conditions Use this tool to find fold changes in gene expression levels betwe
304. ing genes to 2 7 equations overall correlation 7 3 error bars 4 28 error model 3 44 technical details 3 46 Euclidian metric B 5 exclude data window 9 14 experiment inspector 4 16 interpretations 4 17 normalizations 4 18 parameters 4 17 experiment interpretation changing 3 39 Fold change 3 42 log ratio 3 41 vertical axis 3 40 experiment normalizations window 5 2 experiment parameter 3 35 condition 3 31 multiple 3 30 parameter value 3 29 experiment parameters 3 29 changing 3 32 3 35 color code 3 31 continuous element 3 31 creating new 3 33 3 36 deleting 3 33 3 36 display options 3 30 hidden elements 3 31 importing 3 32 3 35 non continuous elements 3 31 replacing 3 33 3 36 values 3 29 experimental data range 4 32 experiments copying and pasting 3 17 creating new 3 16 inspecting 4 16 export data by copying 3 20 exporting gene lists 9 5 external program interface 8 35 external programs 8 34 8 35 arguments 8 39 creating new 8 35 delimiters 8 39 inputs 8 36 inspecting 8 42 outputs 8 37 running 8 40 scripts and 8 32 F FDA compliance electronic signatures 9 14 files layout 2 13 Filter Genes Data File Restriction 6 53 Filter on Data File 6 61 restricting data types 6 61 filter on fold change 6 55 filtering gene list numbers 6 64 find genes 4 4 Find Potential Regulatory Sequence 6 18 find similar minimum number B 8 find similar command 6 6 find similar genes 4 12 flag values 3 9 flags 1 8 3 11 5
305. ing window before re opening the Condition Scatter Plot window 5 Specify the data type Available options are Control Raw or Normalized 6 Choose a graph mode for the specified axis Available options are linear logarithmic and fold change Note that the fold change option is only available if you are looking at normalized data from an interpretation or a condition Adding Lines You have the option to draw lines that help distinguish distinct groups of data points Although these lines can represent many types of data thresholds they are generically called fold change lines These fold lines are valuable because you can select points that lie above or below them by right clicking in the appropriate position in the genome browser In addition to fold lines you can add lines to the origin of each axis as well as draw a line of best fit To modify the use of lines 1 Click Display Options or right click anywhere in the condition scatter plot window and select Display Options Viewing Data 4 69 Condition Scatter Plot Click the Lines to Graph tab To see a grid inside the plot area you can have lines drawn at the major and minor tick intervals of each axis Check the X Y Z Axis Grid Lines checkboxes to display The color of these grid lines is represented in the Grid Color box at the bottom of the window To modify the grid color click Change Changing Labels and Features The scatter plot view also allows you to chang
306. interpretation can be viewed simultaneously Using this feature disables the condition slider at the bottom of the genome browser Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Place gaps between Heatmap Tiles Uncheck this option to remove gaps between tiles Display Navigational Tree Specifies whether to display the navigational tree for the Eisen like subtree view on the left or the top of the viewing area e Use Custom Heatmap Borders Allows you to customize the amount of screen space dedicated to tree branches and labels When this option and the Show Drag Arrows option are selected use the drag arrows in the genome browser window to make adjustments Show Drag Arrows Displays arrows used for changing the size of the area dedicated to tree branches This affects both the thumbnail and the displayed subtree in the Eisen like view Viewing Data 4 57 Ordered List View Ordered List View Allows you to view a gene list in the order of its associated values Values are listed in descending order If you do not have associated values genes are ordered according to the way they are listed in the master gene table Vertical lines representing genes are propor tional to the gene s associated number To view genes in an ordered list go to View gt Ordered List Your l
307. io determines how large the correlation difference between groups of clustered genes must be for them to be considered discrete groups This number should be between 0 and 1 It is not usually appropriate to change separation ratio or minimum distance Separation Ratio The separation ratio determines how large the correlation difference between groups of clustered genes has to be for the groups to be considered discrete groups and not be joined together Increasing separation increases the branchiness of the tree Default Separation ratio is 1 0 Separation ratio can range from 0 0 to 1 0 Ata separation ratio of 0 all gene expression profiles can be regarded as identical To change the maximum correlation number enter a new value in the Separation Ratio box Minimum Distance The number specified in the Minimum distance box determines the minimum separation considered significant between genes This reduces meaningless structure at the base of the tree The minimum distance deals with how far down the tree discrete branches are depicted A higher number tends to lump more genes into a group making the groups less specific e Decreasing minimum distance increases the branchiness of the tree Default minimum distance is 0 001 A value smaller than 001 has very little effect because most genes are not correlated more closely To change the default minimum distance enter a new value in the Minimum distance box Refe
308. ion profiles you create are stored in the Expression Profiles folder The External Programs Folder External programs are analysis programs outside GeneSpring that can be launched from within GeneSpring Data from GeneSpring is sent to the program and output from the pro gram is recognized by GeneSpring These programs are kept in the External Programs folder The Bookmarks Folder Bookmarks are saved display settings such as experiment gene list color scheme selected genes etc You can always save your current display and return to it later by opening the Bookmarks folder and selecting a particular bookmark The Scripts Folder Scripts are tools that save time by allowing a long series of data analysis steps to be per formed at once Scripts are re usable and can be applied to any data set You can create your own scripts using the Silicon Genetics ScriptEditor All scripts including compli mentary scripts shipped with GeneSpring 4 2 are stored in the Scripts folder Welcome to GeneSpring 1 15 Commonly Used Functions Commonly Used Functions To open a different genome choose File gt Open Genome or Array and follow the submenus to your desired genome To open another copy of the main window choose File gt New Linked Window Each of these brings up a new main window similar to the one described in GeneSpring Basics on page 1 6 To change preferences colors start up genome etc choose Edit gt Preferences See
309. ions O Define Parameters Parameters O Define the Default Interpretation Experiment Interpretation O Define the Error Model Error Model Close Figure 3 8 The New Experiment Checklist 12 At this point you can examine and change your normalizations interpretations and parameters To define or edit normalizations click Nozmalizations For information on defining normalizations see Default Normalizations on page 3 21 To define or edit parameters click Parameters For information on defining parameters see Experiment Parameters on page 3 29 To define or edit default interpretations click Experiment Interpreta tion For information on defining default interpretations see Experiment Interpretations on page 3 39 If you prefer to make these changes later click Close You can load the experiment another time and select Experiment Normalizations Change Experi ment Parameters or Change Experiment Interpretation from the Experiments menu in the main GeneSpring window 3 8 Working With Experiments Using the Column Editor Using the Column Editor If GeneSpring does not recognize your file format use the Column Editor to assign head ings and functions to each column in your data file amp Import Data Column Editor Step 1 Assign functions to columns in your data file You must assign a Gene Identifier column and at least one Signal column Step 2 If your data file has a row of c
310. ious coloration scheme you must re open the Experiment Data Range pop up win dow and enter your old values For more details on trust see Trust on page 4 30 For more details on normalization see Normalizing Data Color by Significance Data are colored based on how far the gene is over or underexpressed relative to a nor malized expression level of 1 in terms of the standard error of the measurement The standard colorbar is replaced with a colorbar ranging from 30 to 30 The standard error model is based on the Cross gene Error Model if the Cross gene Error Model is turned on For more information about the Cross gene Error Model see Cross gene Error Models on page 3 44 Otherwise the standard error is based on the standard deviation of the repli cate data for a particular gene and condition for information about the calculation of this error see The Gene Inspector on page 4 10 To color your genes by significance select Colorbar Color by Signifi cance or select the Significance option from the pull down menu on the Coloring tab in the Display Options window Color by Venn Diagram This option colors genes based on their membership in one or more gene lists in a Venn diagram To assign a gene list to the Venn diagram 1 Select Colorbar gt Color by Venn Diagram or select the Venn Diagram option from the pull down menu on the Coloring tab in the Display Options window 2 Drag the list fro
311. irectory section of the screen select root to upload the experiments in your GeneSpring Experiments folder to the GeNet Experiments folder Select default to upload the experiments to your GeNet user directory Exporting GeneSpring Data 9 13 Publishing Data to GeNet The Exclude Data Window 5 Select Data to Upload Data Summary Lo Gemeusts Experiments Gene Trees Condition Trees Ciassiicatons Pathways sammes 1 2 1 2 1 16 To Upload r Data to Upload Check data objects to upload Check All 6x5 SOM for Yeast cell cycle time series no 90 min Default Classifications Clear All Chromosome Number Classifications Yeast cell cycle time series no 90 min Default Interpretatioi Condition Trees Yeast cell cycle time series no 90 min Experiments steroid biosynthesis Gene Lists PIR keywords sterol biosynthesis Gene Lists PIR keywords sporulation Gene Lists PIR keywords sulfate transport Gene Lists PIR keywords sugar transport Gene Lists PIR keywords stress induced protein Gene Lists PIR Keywords signal transduction Gene Lists PIR keywords shikimate pathway Gene Lists PIR keywords serine proteinase Gene Lists PIR keywords sphingolipid biosynthesis Gene Lists PIR keywords sorting signal recognition Gene Lists PIR keywords Figure 9 6 The Exclude Data window 33 2b ES E il f fe f mi g w g mi g w w i jw 2 jw l w jw je B
312. is display option shows the same gene multiple times The number of times a single gene is drawn is equal to the number of values defined as conditions When the browser display is colored using a color option other than Color by Parameter it is impossible to visually distinguish which value a particular gene line or gene point repre sents although separate gene lines for each value defined as a condition are still drawn See Set Value Order on page 3 34 for details on how to change that order Individual patients or strain types are variables commonly defined as color codes condi tions because although they are different values it is interesting to see them visually compared to one another It is likely the expression patterns of individual patients with the same disease will react in a similar way under similar conditions Often it is when the expression patterns are not similar that the results are interesting This is where graphs of parameter values defined as color coded conditions are useful as they allow you to easily compare varying conditions of the same gene Changing Experiment Parameters Use the Experiment Parameters window to assign parameter names and units e g time and minutes to your data You can also use this window to add and delete parameters and rearrange the order of non numeric parameter values on the horizontal axis If you set up your file names as described below your parameter assigning process is aut
313. ist appears in its order Full GeneSpring Yeast Genes like YMR199W CLN1 0 95 File Edit View Experiments Colorbar Filtering Tools Annotations Window Help H Gene Lists HA EC HC PCA Yeast cell cycle time HC PIR keywords C Simplified Gene Ontology X all genes I 3 all genomic elements 8 AcGCCT in all ORFs g like YMR199 V CLN1 0 H Experiments HC Gene Trees HC Condition Trees HC Classifications HC Pathways FC Array Layouts HOJ Expression Profiles HOJ External Programs HC Bookmarks Hy Scripts Colored by Yeast cell cycle time series no 90 min Default Interpretation Gene List like YMR199V CLN1 0 95 117 Show All Genes Zoom Out Zoom F Sut Magnification 1 Figure 4 31 Ordered List View Expression E E E E E E E E EH E Ordered List Display Options The following display options are available in ordered list view Features The available options for this view are listed below e Coloring See Color on page 4 30 Legend See Legend on page 4 28 To modify the appearance of your tree select View gt Display Options or right click anywhere in the genome browser and select Display Options The Display Options window includes a Features panel with the following options Show Associated Value When the view is zoomed so as to enlarge t
314. itional window appears When the query boxes appear these will contain actual SQL commands GeneSpring must re query the database each time you restart the program If this takes too long you can right click the appropriate database icon and select the save to disk option All commands in the experiment files can also be added to the database file More Complicated Databases You can link various tables together in SQL This typically requires a proficient user of databases check with the person who built your database if you have questions There are many ways to enter and organize data within databases If the data organization in your debase if confusing you might want to make separated tables for your data or part of your data For example you could make a separate table just for parameters like the table below Sample 1 Parameter Name Parameter Value 1 elephants 2 2 elephants 34 2 daises 30 In this table there are no parameters in the individual columns All parameters tables should have an associated sample number If you use a GATC database you must re link all the sample numbers to the parameter numbers In that case you must define an SQL In that case you must define a SQL line to get those parameters for example SOLgetParameters select This should retrieve values of and names of the parameter Installing from a Database A 23 Entering your Database into GeneSpring A 24 I
315. ization on page 5 7 e Per Chip Normalizations on page 5 10 Per Gene Normalizations on page 5 13 Normalization Strategies for Specific Technologies on page 5 17 Editing a Normalization Step 1 Select the desired step in the list of normalizations to be performed and click Edit or double click the step number of the selected normalization 2 The configuration screen for the selected normalization appears 3 Make any desired changes to the normalization settings 4 Click OK to save your changes or Cancel to exit without saving Removing Normalization Steps To remove normalization steps select one or more steps in the list of normalizations to be performed and click Delete Be certain you have selected the correct step since no con firmation dialog appears Normalizing Data 5 3 Experiment Normalizations Re Ordering Normalization Steps To move a normalization step select its name in the list and click Move Up or Move Down Continue until the steps are in your desired order Applying Default Normalizations When an experiment is created during the sample import process normalizations are applied before you reach the Experiment Normalization window These normalizations are determined from the data format of the samples in the experiment For information on the default normalizations used during sample import see Default Normalizations on page 3 21 When you create an experiment from samp
316. k Set Value Order For example to show the numeric continuous parameter Kryptonite Concentration in reverse order 40 30 20 10 0 of the normal arrangement 0 10 20 30 40 you first must change the setting to a non numeric parameter and select the column by clicking on the gray bar at the very top You cannot change the order of a parameter defined as numeric To select part of a column highlight it in the normal fashion Click in the topmost cell you want to select while holding down the Shift key GeneSpring selects down the column for you Click Set Value Order To use the Sort Ascending or Sort Descending buttons select all the values to be ordered The main GeneSpring window sorts your parameters according to the new system You can also sort manually by selecting one parameter value and use the move up move down buttons to arrange the order to your liking Working With Experiments 3 34 Sample Attributes Sample Attributes Attributes are values associated with a sample such as time drug concentration etc You may have many attributes applying to a single sample These attributes are selected and associated with a sample in the Sample Manager For more information see Attributes and Parameters on page 4 14 You can add any number of additional attributes you like using the Sample Manager or the Standard Attribute Editor Additional sets of attributes are available for download from the Silicon Genetics websit
317. ks are not editable but can be used in any script If no sample external program building blocks appear in your ScriptEditor s navigator click the File Access link to download the samples from http www sigenetics com cgi SiG cgi Products GeneSpring extProgs smf Place the jar file in the folder GeneSpring data Programs You may have more external program building blocks as GeneSpring and ScriptEditor will create an external program primitive for every external program in your GeneSpring You may not have any external program building blocks in the ScriptEditor if you are working on an older version on GeneSpring or if you do not have any external programs Open Building Blocks External Programs Name Input Description Load Classification A filename Runs an external program to load and output a clas from File sification from a file on disk Load Experiment from A filename Runs an external program to load and output an File experiment from a file on disk Load Gene List from A filename Runs an external program to load and output a gene File list from a file on disk Load Gene List with A filename Runs an external program to load and output a gene Numbers from File list with associated numbers from a file on disk Load Tree from File A filename Runs an external program to load and output a gene tree from a file on disk Save Classification to One classifica Runs an external program to save a c
318. l 1 Select Experiments gt Cross Gene Error Model The Cross Gene Error Model win dow appears Cross Gene Error Model for Experiment Yeast cell cycle time series no 90 DER Deviation from 1 0 Select the parameters that differentiate groups of replicates IV time minutes Error model coefficients Note Replicate samples will have identical error model coefficients BaseProportion 1 yeast timeseries txt column 2 NIA 2 yeast timeseries txt column 3 NIA 3 yeast timeseries txt column 4 NIA 4 yeast timeseries txt column 5 NIA 5 yeast timeseries txt column 6 NIA 6 yeast timeseries txt column 7 NIA 7 yeast timeseries txt column 8 NIA 8 yeast timeseries txt column 9 NIA 9 yeast timeseries txt column 10 NIA 10 yeast timeseries txt column 12 NIA 11 yeast timeseries txt column 13 NIA 12 yeast timeseries txt column 14 NIA 13 yeast timeseries txt column 15 N A 14 yeast timeseries txt column 16 N A 15 yeast timeseries txt column 17 NIA 16 yeast timeseries txt column 18 NIA Average of Base Proportional N A OK Cancel Help Figure 3 28 The Cross Gene Error Model window 2 If you have replicates for each condition select the Replicates radio button and select parameters to treat as replicates Click OK If you do not have replicates for each condition select the Deviation from 1 0 radio button and click OK Note Double click on a row to view that sample in the Sample Inspector 3 Select Experim
319. l connect The icon attribute is optional and allows you to specify a graphical image to represent the database Contents lt PhysicalDatabase gt lt TechnologyType gt Header Genome Names GetSampleIDs GetSampleAttributes GetFile lt GetRawData gt Attributes name required icon Usage Database name A ffymetrix Database icon usr local graphics icon gif Database Notes Required can appear multiple times LoadClass Loads the driver that connects to the database In some cases you may want to use a JDBC driver written in Java which must be instantiated at startup You can specify any number of these drivers Any class you specify however must be in your CLASSPATH This ele ment is optional If you are using a default driver this is not necessary but if you are using a specific driver you must specify it here Contents plain text Attributes n a Usage LoadClass sun jdbc odbe JdbcOdbcDriver LoadC1ass Notes Optional frequently used A 10 Installing from a Database Connecting your Database to GeneSpring lt ProcessedDataListFile gt This setting specifies where the database will save the list of samples that have been uploaded Contents plain text Attributes n a Usage lt ProcessedDataListFile gt usr database ProcessedList txt lt ProcessedListData File gt Notes Required lt PhysicalDatabase gt You must have one Physical Database tag for
320. l parameters in alphabetical order See Parameter Display Options on page 3 30 for details Do Not Display Replicate Select this mode when the parameter does not differentiate between samples For example if you have multiple patient samples with different cancer types you could select Do Not Display for the parameter Age and Continuous for Cancer Type This would group all the samples by the parameter values for Cancer Type and ignore Age when grouping samples into conditions Note that when the same gene occurs twice in the course of an experimental set it is called a repeat and the measurements are averaged together This cannot be changed Color Code The Color Code mode colors genes by parameter the number of times a single gene is drawn is equal to the number of parameter values defined as conditions allowing you to easily compare varying conditions of the same gene By default parameter values are listed in alphabetic or numerical order See Parameter Display Options on page 3 30 for details Working With Experiments 3 43 Cross gene Error Models Cross gene Error Models Using the Cross gene Error Model The ability to estimate measurement and sample to sample variation in microarray based experiments is often compromised by the fact that the cost in both time and materials of performing large numbers of replicate samples is quite high If the cross gene error model is turned on GeneSpring accounts for erro
321. l probability of obtaining even a single false positive test to be no more than a This is a very strong criterion but may be so strong for large lists of genes that no genes are identified as significant The Benjamini and Hochberg test controls the false discovery rate defined as the proportion of genes expected to be identified by chance relative to the total number of genes called significant Bonferroni The Bonferroni multiple testing correction based on Bonferroni s ine quality limits the chance of a false positive results to be no more than a by multiplying each nominal p value by N with a maximum of 1 This process controls the FWER and the expected number of genes by chance is a Bonferroni step down Holm The Holm step down adjustment computes the most significant p value and whether it meets the a cutoff after multiplying by N If that gene is found to be significant the next most significant gene is considered but the gene that was found significant is removed from the multiple testing so the multiple testing adjustment is now based on N 1 This process is continued as long as genes pass the successive tests This process controls the FWER and expected number of genes by chance is a Westfall and Young permutation This procedure estimates the significance levels of each test by a nonparametric permutation calculation based on the distribution of the significance levels across all possible reassignments of sampl
322. lassification to File tion disk A save data window appears and prompts you to enter a filename Save Experiment to One Experi Runs an external program to save a classification to File ment disk A save data window appears and prompts you to enter a filename Save Gene List to File One gene list Runs an external program to save a gene list to disk A save data window appears and prompts you to enter a filename 8 32 Scripts and External Programs Script Building Blocks Name Input Description Save Gene List with Numbers to File One gene list Runs an external program to save a gene list with associated numbers to disk A save data window appears and prompts you to enter a filename Save Tree to File One gene tree Runs an external program to save a gene tree to disk A save data window appears and prompts you to enter a filename Scripts and External Programs 8 33 Scripts and External Programs Scripts and External Programs External programs are listed in the ScriptEditor under the navigator s external programs folder In this folder you will find an external program building block for each external program defined in GeneSpring Ten example external program building blocks are pro vided There are two significant types of external program building blocks Load data object to file the loading external programs make a specified data object from outside
323. ld underexpressed in at least one condition and automatically sends the results to GeNet 8 2 Scripts and External Programs Scripts Series of k means increments of 5 Generates 10 k means classifications each with a differing number of starting sets and returns the classification with the highest explained variability The Run Script Window From this window you can execute scripts and view information about them If you have a connection to GeNet and are using Remote Execution Servers you can execute the script on a remote computer At the bottom of the screen is a button labeled View Script Click this button to open a new window containing a graphical representation of the script On this screen click the Details button to view detailed information in the Script Inspector For more informa tion on the Script Inspector see The Script Inspector on page 8 5 Run Script QT Clustering HEr HO Gene Lists Inputs Experiments all genomic elements 7 216 genes HC Gene Trees HC Condition Trees HC Classifications HO Pathways pa Yeast cell cycle time series no 90 min Default Interpretation Expression Profiles See All Parameters Knobs 1 Minimum Cluster Size 10 2 Minimum Correlation 0 98 3 Similarity Measure Standard Correlation hd this script has no notes Computation Pre ei i C Compute on a GeNet RemoteServer Progress
324. lement s opening and closing tags The following are examples of elements with contents lt PhysicalDatabase gt lt UserName gt BioMan lt x UserName gt lt PhysicalDatabase gt In the above example the lt UserName gt element including its own contents and its clos ing tag is the contents of the lt PhysicalDatabase gt element The username BioMan is the contents of the lt UserName gt element A 6 Installing from a Database Connecting your Database to GeneSpring Attributes are values defined within the opening tag of the element itself In the following example name is an attribute and dbname is the value of the lt Physi calDatabase gt element lt PhysicalDatabase name dbname An element may have any number of attributes or contents An empty element an element that has attributes but no contents can be closed within the opening tag by adding a slash at the end i e tag value example gt Tag Reference Table This table provides a list of all available tags for the database configuration file For more detailed information on the use of each tag see Tag Definitions on page A 9 The Element column lists each of the available tags The Contents column lists the type of contents that tag can contain 1 e plain text or the names of the tags it can contain The Attributes column lists the attributes that tag can contain The Allowed In column lists the tags between which the current element i
325. les SiliconGenetics GeneSpring data This directory contains a folder for each genome contained in GeneSpring Renaming a Genome To rename a genome in GeneSpring 1 Using a text editor open the genomede f file for the desired genome This file is located in the genome s folder in the GeneSpring data directory For example the genomedef file for the Extraterrestrial Yeast genome might be located in C NProgram Files SiliconGenetics Gene SpringNdataNExtraterrestrial Yeast ExtraterrestrialY east genomedef 2 The genome name is set in the first line of this file For the Extraterrestrial Yeast genome it would look like this Name Extraterrestrial Yeast 3 Delete the existing name and enter the new name i e Name Martian Yeast 4 Save your changes and exit Your changes will appear the next time you start Gene Spring Deleting a Genome Because each genome s folder contains all of the information assocated with a genome in GeneSpring including experimental data and annotations do not delete a genome unless you are absolutely positive no one is using any of the data it contains You can remove a genome from GeneSpring without deleting its data by moving its folder to another location outside the GeneSpring data directory such as a temporary directory or your own user directory The genome will not appear the next time you start GeneSpring To restore a genome removed in this way replace its fol
326. les that have already been imported the default normalizations are the Generic One Color and Generic Two Color scenarios These nor malizations can be applied when you create an experiment using the Create New Experi ment menu or by clicking the Use Defaults button in the Experiment Normalizations window To remove any changes you have made and apply only the default normalizations for your data type click Use Defaults A confirmation dialog appears Click OK to continue or Cancel to quit and return to the main Normalizations screen Viewing Text Descriptions To view a more detailed description of a particular normalization select its name in the list and click Get Text Description A dialog appears with a description of the selected normalization You can copy the text in this dialog to the keyboard by clicking Copy to Clipboard You can then paste the text into a text editor Saving a Normalization Scenario Click Save As Scenario tosave the current normalization sequence for use in other experiments A dialog appears prompting you to enter a name for the sequence to be saved Once you save a normalization scenario it is available for use in all genomes A saved scenario records whether each step was applied to all samples or a limited number of samples but not which samples the steps were applied to It also does not record a list of positive or negative controls or a list of control samples as in the Normalize to Specific Samples
327. lid gt Notes Optional rarely used lt LowerBoundOnSignalColumn gt When an error model is known specifies a lower bound on the signal value Contents plain text Attributes n a Usage LowerBoundOnSignalColumn l LowerBoundOnSignalColumn Notes Optional rarely used lt UpperBoundOnSignalColumn gt When an error model is known an upper bound on the signal value Contents plain text Attributes n a Usage UpperBoundOnSignalColumn l UpperBoundOnSignalColumn Notes Optional rarely used Installing from a Database A 21 Connecting your Database to GeneSpring lt StandardDeviationSignalColumn gt When an error model is known specifies the standard deviation of the signal value Contents plain text Attributes n a Usage StandardDeviationSignalColumn l StandardDeviationSignal Column Notes Optional rarely used ColumnHeaderLine Specifies the row containing the header names Usually this can be determined automati cally Contents plain text Attributes n a Usage ColumnHeaderLine l ColumnHeaderLine Notes Optional rarely used A 22 Installing from a Database Entering your Database into GeneSpring Entering your Database into GeneSpring Prepared Databases In the main GeneSpring window select File gt Get Data from Database The majority of the remaining Experiment Wizard panels are filled in automatically If you left the debug setting true an add
328. lid gt lt LowerBoundOnSignalColumn gt lt UpperBoundOnSignalColumn gt lt StandardDeviationSignalColumn gt lt ColumnHeaderLine gt Attributes type Usage lt Format type Affymetrix or lt Format gt lt Format gt Notes Required lt GeneColumn gt Specifies which column in the sample data contains the gene identifier This tag is used only if your data are in a nonstandard format Contents plain text A 18 Installing from a Database Connecting your Database to GeneSpring Attributes n a Usage GeneColumn l GeneColumn Notes Required if data type is not specified in the Format tag Headlines Number of header lines to skip at the top before further processing This can usually be determined automatically if the columns are specified by header This tag does not apply when sample data are retrieved from a database Contents plain text Attributes n a Usage Headlines 0 Headlines Notes Optional lt SignalColumn gt Specifies the column containing the raw signal data Contents plain text Attributes n a Usage lt SignalColumn gt 31 lt SignalColumn gt Notes Required if data type is not specified in the Format tag lt NormalizedColumn gt Specifies the column containing normalized data Contents plain text Attributes n a Usage NormalizedColumn 30 NormalizedColumn Notes Optional rarely used ReferenceColumn Specifies the column con
329. ll ORFs like YMR199 V CLN1 0 H Experiments H Gene Trees Condition Trees tH Classifications H Pathways H Array Layouts t Brown s Yeast Layout Expression Oe A00 e60000 000099 0960906 0900 39 09000006 QOQOJOOQ Q909009900900900 90 90099 0 000 0 00 XJ QOQOJOOO 6000 0990000900090900000 000 000CXJ DOWIDIIDIIIW IN ID OW DION IDIOIWIOIGN QODIODHODIOIOD OD OID IPOIVWOIDWOIDWOWIOIO QOWIDIDWIIOW INI IDOI DION OIIOIIOIII QQOOQQOQQO00000000000 0 0090000 700090900 QOQO 00000900069000090900909000900000 0000XJ QOQOOOOQO0 9Q0000900 009090090 90099 0 0 0 0X DOWIDIWOIOW IN IDOI DION ODIIOIOIOI QOQOJOOOQO0900009000090900909006 OOIGIIOIOI DOWIDIIWOIOW INI IDOI DION IDIIOIOIII DOWIDIOIDWOIOW OI IDOI DION IIIOIOIIN QOQOJOOQOO90000900000 OQDWIOIOIDIIOIOIIN DOWIDIIDWOIOW ON IDOI DION IIIOIOIGA QOQO9OOQO090000900009090090 900900 0000 GQ QOQO9OOQOO09000090000090090 900900 000 9000 QOQO9OOQQOO9O0000900000900909000 00 000 9001 QOQO9OOOO090000900009090090900090 0009001 QOQO9OOOQQOO9O000090000909019 900900000 9 001 QOQO9OOQ00900009000009009090090 0 00 9001 QOQOOOOQOO9O000090000090090900 90 00 9 001 QOQO99090090000900009090090900000 00 0001 QOO99000090000900009090090900 900 000 9 001X QOQO9OOQQ09000090000090090900900 00 9 001 QOQO9OOQO09000090000909000 0900 0 00 00 9 001 Q0000090000900000900000000 90900 9900009900 rH pression Profiles Selected Pat Brown s Yeast Layout H
330. lly based on the value of a script input or parame ter For example the sample script Best k means has two inputs a gene list and an experi ment If you want the script output to include the name of the script the name of the gene list and the name of the experiment do the following 1 Select the appropriate script output socket 2 Inthe Make Name field in the Notes section of the ScriptEditor enter the following Best k means name 1 in name2 3 Save the script If you run Best k means using the gene list ACGCGT in all ORFs and the experiment Extraterrestrial Yeast Study the output classification is automatically named Best k means ACGCGT in all ORFs in Extraterrestrial Yeast Study The procedure is similar when basing output names on knob values except that the format for knob values is parameter x where x is the number of the knob So for example to cause the output of the Filter on Noise sample script to include the name of the input experiment and the value of the Filter Cutoff knob enter the following in the Make Name field Filter on Noise on name 1 with Filter cutoff param 1 Scripts and External Programs 8 17 Using the ScriptEditor If your input experiment is Extraterrestrial Yeast Study and the filter cutoff is 0 1 the output gene list is named Filter on Noise on Extraterrestrial Yeast Study with Filter cutoff 0 1 Any character that is legal in the name of a navi
331. ls Bookmarks If you ever need to pause in the midst of your analysis you can create a Bookmark to hold your place The Bookmark saves all your current display settings including experiment gene list coloration and selected genes 4 26 Viewing Data Display Options Creating a Bookmark 1 Go to the File menu and select Save Bookmark The Save Bookmark dialog box appears 2 Name your bookmark 3 Click Save Accessing an Existing Bookmark 1 Click the Bookmarks folder in the navigator 2 Click the name of any bookmark to open Or 1 Go to File and select Load Bookmark File The Load Bookmark dialog box appears 2 Select your bookmark 3 Click Open The Vertical Axis In Graph Graph by Genes Bar Graph Scatter Plot and 3D Scatter Plot modes the dis play options window contains a panel for modifying the presentation of data on the verti cal axis The following section describes the vertical axis options for each of these views except the scatter plot views see Scatter Plot Display Options on page 4 45 and 3D Scatter Plot Display Options on page 4 49 The Display Options window allows you to select the experiment interpretation that is graphed To change the current interpretation 1 Select View gt Display Options 2 Click the Vertical Axis tab 3 Select an interpretation from the Display Options navigator panel 4 Click Graph Experiment gt gt 5 Click Apply The vertical
332. lse Select Condi One Boolean Condition Selects the first tree if true the second tree tion Tree two condition tree if false trees Select Experi One Boolean Experi Selects the first interpretation if the sec ment two experiment ment inter ond if false interpretations pretation Select Gene One Boolean Gene Selects the first gene if true the second if two genes false Select Gene One Boolean Classifica Selects the first classification if true the Classification two classifica tion second if false tions Select Gene List One Boolean Gene list Selects the first gene list if true the second two gene lists if false Select Gene One Boolean Gene tree Selects the first tree if true the second if Tree two gene trees false Select Number One Boolean Number Selects the first number if true the second two numbers if false Select One Boolean Sequence Selects the first sequence if true the sec Sequence two sequences ond if false Gene List Manipulations Open Scripts Basic Scripts Gene List Manipulations Name Input Knobs Description All Genes None None Outputs the list of all genes Scripts and External Programs 8 21 Script Building Blocks Name Input Knobs Description All Genomic None None Outputs a list of all genomic elements Count Genesin Atleast one gene None Outputs the number of gene
333. lt genome 1 18 disk cache size 1 22 firewall 1 21 gene labels 1 21 GeNet 1 22 license manager 1 22 memory 1 22 misc 1 23 remote execution 1 23 restrict gene list searches 1 23 system 1 22 text color 1 20 web browser defaults 1 21 principal components analysis 7 16 print trees with labels 4 55 printing 9 4 publish to GeNet 9 12 exclude data 9 14 R raw data definition 1 7 region normalization 5 17 regions 1 7 3 11 regulatory compliance 8 4 electronic signatures 9 14 Regulatory Sequence 6 18 Expected 6 22 Observed 6 22 P value 6 22 Random Rate 6 22 Sequence 6 22 Single P 6 22 Tests 6 22 remote server 8 5 Index 6 replacing parameters 3 33 3 36 replicates definition 1 7 required sample attributes 3 6 restrict data types Control Signal 6 53 Normalized Data 6 53 Raw Data 6 53 restricting data types 6 61 restrictions arbitrary 6 63 data types 6 53 filter on fold change 6 55 R values 7 4 S sample attributes 3 35 sample manager 3 23 filter on attributes 3 26 filter on experiment 3 25 filter on keyword 3 27 filter on parameter 3 25 filtering methods 3 25 saving images 9 2 scatter plot condition 4 68 scatter plot view 4 45 vertical horizontal axes 4 46 scatter plot view 3D 4 49 script building blocks 8 7 8 20 ANOVA 8 31 boolean 8 20 boolean select 8 21 clustering 8 27 correlations 8 28 count groups 8 27 filtering 8 29 gene list manipulations 8 21 GeNet downloading 8 22 8 23 GeNet publishing
334. lts 3 Samples m yeasttimeseries t Itizy 1028 m yeasttimeseries t Itizy 1027 zi yeasttimeseries t Itizy 1036 jo Autnor_ creation Date The Sample Manager Research Group Type Your Name Jun 19 2001 10 2Silicon Genetics 10 Type Your Name Jun 19 2001 10 2Silicon Genetics 0 r Type Your Name Jun 19 2001 10 2Silicon Genetics 100 Add Add All Remove Remove Configure Columns Selected Samples 0 Samples Close Help Figure 3 16 The Filter on Parameter tab To select a parameter click its name in the Select a Sample Parameter of Interest list To select parameter values click the desired values in the Select Parameter Values list To select all values in the list click Select A11 To clear your selections click Clear All The Filter Results list is updated dynamically as you make your selections Filter on Attributes S Sample Manager Filter on Experiment Filter on Keyword Filter on Parameter Show All Filter on Attributes Select a Sample Attribute of Interest y Select Attribute Values 0 minutes 10 minutes 100 minutes 150 minutes Use samples stored Locally x Filter Results 4 Samples o author creation pate yeast timeseries t Itizy 1027 m yeasttimeseries t Itizy 1037 a yeasttimeseries t ltizy 1036 DoR Research Group Type Your Name Jun 19 2001 10 2 Silicon Genetics Type Your Name Jun 19 2001 10 28ilicon Genetics Type Your Name
335. m W Experiment Parameters I Processed Data M Sample Attributes MV Raw Data Files I Gene Annotations J Array Images Contact Information First Name Middle Initial LastName Jekjll Email Address hjekyll example com Organization Mad Scientists Inc Experiment Accession No This is a 4 letter code assigned by EBI Figure 9 1 The MAGE ML Export window 2 Enter all necessary information in the fields provided For EBI compliance you must include the following information Choose an Experiment Design Type also required for MIAME compliance Check only the Experimental Parameters Sample Attributes and Raw Data Files boxes Fill inall of the Contact Information including your lab s EBI accession number If you do not specify an accession number your experiment s accession number is used by default Note If you do not have an EBI accession number you must contact EBI to obtain one By default GeneSpring s MAGE ML export feature specifies the appropriate options to produce EBI compliant MAGE ML data You have the option of including more information than EBI requires but if you do so your data will not be accepted by Array Express 3 Click OK The Choose Output Directory window appears Exporting GeneSpring Data 9 9 Exporting Gene Lists Choose output directory for MAGE ML files Drives Se La Directories FHS ca 1 s ADOBEAPP Brother H Config Msi H
336. m Fully Out Magnification 1 Figure 4 24 Physical position view for a human genome P Full GeneSpring Human Oncogenes Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists EHS Experiments ei Demonstration Experime _ Gene Trees LJ Condition Trees L Classifications LJ Pathways L Array Layouts L Expression Profiles L External Programs L Bookmarks Scripts Expression Colored by Demonstration Experiment Default Interpretation Gene List all genes 159 Show All Genes Zoom Fully Out Horiz mag 2 1 Vert mag 23 8 Figure 4 25 Zooming in for a closer look at chromosome 12 At high magnifications the labels associated with the chromosome s cytogenetic bands are visible The Load Sequence command In GeneSpring versions 4 0 and later sequence information is loaded by default if it is available If you have an old version of GeneSpring and cannot update it see Updating GeneSpring on page 1 4 follow these directions 4 42 Viewing Data Physical Position View The Load Sequence command is applicable only for sequenced organisms Load the nucleic acid sequence to magnify a section of the physical position view until the nucleic acid sequence is displayed Loading the sequence also allows you to take advantage of GeneSpring s other sequence based features such as Tools gt Find Potential Regulatory Sequences You can load the nucleic
337. m the navigator to the appropriate section of the venn diagram at the right side of the window 4 32 Viewing Data Display Options You can also assign the circles in a Venn diagram by right clicking on a gene list and selecting the Venn Diagram option For more information about creating Venn diagrams and using them for analysis see Making Lists with the Venn Diagram on page 6 13 Color by Parameter This option colors genes based on the value of parameters This coloring scheme is best suited for use with Graph view and Bar Graph view when different conditions are indi cated with discrete symbols To color by parameter 1 Select Experiments gt Change Experiment Interpretation 2 Choose the parameter s to color by and click Color Code for that parameter Click Save to create a new interpretation 3 Select Colorbar gt Color by Parameter Full GeneSpring Rat Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotations Window Help HC Gene Lists 10 H Experiments 1 H Giant Rat Really Giant Rat si esy HC Gene Trees HC Condition Trees HO Classifications Ha Pathways Ha Array Layouts HC Expression Profiles HCJ External Programs HC Bookmarks C Scripts Normalized Intensity log scale Y axis tes Default Interpretation S oom C Z t Magnification 1 The conditions
338. mFileName gt Notes Optional lt RegexpMatch gt When using lt IDFromFileName gt use either this tag or lt DatabaseQuery gt but not both Contents plain text Attributes n a Usage lt RegexpMatch gt AffyChipID chip lt RegexpMatch gt Notes Optional JavaQuery Allows you to use a Java class to return an array of identifiers You specify a command such as lt JavaQuery class com pharma database getParameters extraArgs Blah gt and it creates an instance of com pharma database getParameters using the default con structor That class should implement com sigenetics ext database GetAttributes Then for each attribute a function is called with arguments of the database identifier the database genome name and the extra argument The return value is an array of com sigenet ics ext database Attribute objects each with name value units and isNumeric fields Contents n a Attributes class extraArgs optional Usage lt JavaQuery class com pharma database getIDs extraArgs 7 Notes Optional applies only to GetSampleIDs GetSampleAttributes and lt GetFile gt Installing from a Database A 17 Connecting your Database to GeneSpring lt Format gt Specifies the format of raw data to be retrieved If this data is in a known format you can specify it using the type attribute Currently supported types are Incyte Internet Download Incyte Affymetrix Affymetrix Piv
339. mRNA produced by a given gene under specific conditions Expression Profile Lines representing gene profiles that you draw in the genome browser You can then search for genes matching that profile External Program Analysis programs outside GeneSpring which can be launched from within GeneSpring Data from GeneSpring is sent to the program and output from the program is recognized by GeneSpring These programs are kept in the External Programs folder Glossary 1 Folders The yellow icons denoting the various directories where data are stored e g Gene Lists folder Experiments folder etc Gene List A list of genes based on some criteria Gene Tree Dendrograms used as a method of showing relationships between the expression levels of genes over a series of conditions Genome The set of all genes on a chip or array Genome Browser The area of a GeneSpring window containing a visual representation of genes Main Screen The first GeneSpring window that appears after you open a genome such as the default yeast genome window that appears after initially starting the program Measurement The smallest unit of data recognized by GeneSpring These raw values can be seen in the upper right table in the Gene Inspector Menu Pull down options that allow you to perform tasks in GeneSpring The main menu can be found at the top the main GeneSpring window Windows or at the top of your screen Macintosh Navigator T
340. mber in this box the number of Button lines in the table below changes to match the number you entered You can change this number at any time while you are on this screen In the first column of this lower table titled Button label1 enter the name of the web database as you wish it to appear on a button within GeneSpring In the right hand column titled URL enter the URL of the database with the sys tematic name ofthe gene replaced by a semicolon If the semicolon representing the place the systematic name of the gene should go is at the end of the URL it may be omitted You can also have links using names other than the systematic gene name To use one of these attach a special character before the link name in the Button label column Do not put a space or other character between the special charac ter and the link name To use the common name use a dollar character To use 2 4 Creating Genomes Note The New Genome Installation Wizard the GenBank Accession Number use a percent sign To use the systematic name less anything after a dash use the dash You can use any column in the Master Table of Genes in a link by entering lt name of column gt at the desired point in the url where name of column is the name of the column you want to use When you right click on this screen there is no pop up menu allowing you to cut and paste However you can still cut and paste URLs into the matrix fields
341. me concentration etc Parameter values are values assigned to experiment parameters For example Embryonic Postnatal or Adult could be parameter values of the experiment parameter stage while 01 ppm could be a parameter value of the experiment parameter concentration What are Replicates Replicates can be multiple spots on the same array representing the same gene also referred to as a copy the same sample on more than one array e a biological replicate equivalent samples taken from more than one organism Graphically a parameter defined as a replicate is a hidden variable no visual distinction is made based on this parameter or its parameter values What is a Region Regions divide your data into specific sections This is important if you use multiple arrays and want to normalize sections of an array separately rather than normalizing across the entire data set What is Raw Data The analysis process begins by obtaining data in the form of flat files that were generated by your scanning software or other expression analysis technology GeneSpring is capable of recognizing most commercially available formats and can be customized to work with other formats as necessary Typically the gene spot probe set intensity values in these files are referred to as raw data What is Normalized Data If GeneSpring recognizes your file format it applies a set of default normalizations appro priate for your expression analysis tec
342. merge the files together into one row Ctrl click to select multiple rows then use the Merge Selected Rows button Selected Data Files Multiple Files per Sample Sample 1 Rat ORFs bt N i Rov Sample2 Rat ORFs t backup sFiles Guess Merging _ Gear Guess_ Previous Next Cancel Help Figure 3 4 The Merge Files window To merge files select all the files that are on the same sample and click Merge Selected Rows Use Ctrl click to select multiple files in non adjacent rows You can also drag a file from one row and drop it in another to merge those two rows To unmerge rows click Separate Merged Files Working With Experiments 3 5 Importing Experiment Data Click Guess the Rest to have GeneSpring try to match the pattern set by the names of the files you have already merged If the guesses are incorrect click Clear Guesses Click Next when you are done 8 If you imported genes that are not part of the selected genome the Extend Genome screen appears If all the imported genes are part of the selected genome this screen does not appear From this screen you can specify whether or not to add those genes to the genome To add the genes click Yes and they are immediately added This means that if you can cel the data loading process later the genes are still part of the selected genome To skip the listed genes click No 9 If you defined recommended or required attributes the Import D
343. mes become unreadable if more than 100 genes are visible in the current gene list and magnification Show X Axis Label Displays the parameter that is graphed on the X axis Show Y Axis Label Displays the parameter that is graphed on the Y axis Show Z Axis Label Displays the parameter that is graphed on the Z axis Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Coloring Coloring in the scatter plot view is more complicated than in other views because the color of each gene can be derived from the data in any axis In other views the color of the gene is usually linked to the data plotted on the vertical axis In addition the scatter plot allows you to color genes based on a fourth experiment or condition that is not plotted on either axis To modify the way data points are colored l Select View Display Options orright click anywhere in the genome browser and select Display Options Click the Coloring tab Select the type of data that is to be used for coloring from the Color data points by pull down menu For more information about the types of data that are available for coloring see Color on page 4 30 Choose from the Use the expression levels in the following experiment or condition radio buttons to select the axis to be used for color ing Not
344. minator used to normalize the raw data In a two color experiment control strength refers to the control channel When data is reported as the signal divided by the control it is assumed that all expression values are positive The number 1 is considered normal expression any expression value above one is overexpressed and all underexpressed data is less than one but greater than zero This means that all underexpressed data appears flattened because it has to graphically fit between zero and 1 whereas overexpressed data takes up a much larger percentage of the graph from 1 to positive infinity Raw signal values that are negative which is com monly the case in Affymetrix data produce normalized values that are negative To deal with these negative values see The Affine Background Correction on page 5 13 Log of Ratio The Log of Ratio mode graphs normalized values i e the ratio of the signal to the con trol not their logs but spaces them logarithmically The normal expression is 1 The Log of Ratio interpretation solves the problem mentioned above under Ratio where all underexpressed data appears flattened because it has to graphically fit between zero and 1 In this mode underexpressed genes take up as much space visually as overexpressed genes Logarithms of the expression ratios are used as the basis for statistical analysis Yeast cell cycle time series no 90 min 0 1 time minutes 0 10 20 30 40 50 6
345. move All Remove all samples from the Selected Samples list Inspect View the selected sample in the Sample Inspector For more information on the Sample Inspector see The Sample Inspector on page 4 13 Configure Columns Select which columns to display in the sample lists The avail able choices are Sample Name Identifier Authors Creation Date Upload Date Research Group Organization Application Sample Attributes Experiment Parameters Below the Selected Samples list are five buttons Publish to GeNet Publish all samples in the Selected Samples list to GeNet Copy from GeNet Make a local copy of all GeNet samples in the Selected Sam ples list Delete Delete the samples in the Selected Samples list Working With Experiments 3 24 The Sample Manager e Create Experiment Create a new experiment from the samples in the Selected Samples list Edit Attributes Edit the attributes of the highlighted samples in the Selected Sam ples list Filtering Methods Filter on Experiment This method allows you to filter based on samples associated with a selected experiment 5 Sample Manager DER Filter on Attributes Filter Results 16 Samples Creation Date Research Group yeasttimeseries t Itizy 1034 Type Your Name Jun 19 2001 10 2Silicon Genetics yeasttimeseries t Itizy 1033 Type Your Name Jun 19 2001 10 2 Silicon Genetics Select an Experiment
346. ms such as k means and hierarchical clustering because the genes with high two sided confidence val ues are really a mixture of similar and dissimilar genes To compute a Two sided Spearman confidence If r is the value of the Spearman correlation as described above then Two sided Spearman confidence 1 probability you would get a Spearman correlation of r or higher or r or lower by chance Distance Distance is not a correlation at all but a measurement of dissimilarity Distance is based on the measurement of euclidean distance between the expression profile for gene A defined by its expression values for each point in n dimensional space where n is the number of experimental points conditions with data in your experiment and the expres sion profile for gene B This is more formally known as the euclidean metric To standard ize this difference GeneSpring divides by the square root of the number of conditions To compute a euclidean distance Distance a b square root of n Since distance is a measure of dissimilarity the distance d is converted when needed to a similarity measure 1 1 d Or in summation notation Equations for Correlations and other Similarity Measures B 5 Common Correlations n Y 4i Bi i l The next three metrics should only be used to look at special cases They are all modified versions of the Standard correlation Using these three metrics only makes sense when your data
347. mutation slow Benjamini amp Hochberg False Discovery Rate or None Selecting Groups Manually To select groups manually click Select Groups Manually Analyzing Data 6 43 Statistical Analysis ANOVA Select Groups For 2 Way Tests Select the parameters of interest from the pulldown menus Check the groups you wish to compare SelectFirst Groups to Compare Select Second Groups to Compare No of Samples 1 iz Check All Clear All Check All Clear All OK Cancel Help Figure 6 7 The Select Groups window for 2 way ANOVA testing This screen features two tables Each has one column with a pull down menu to select the desired experimental parameter and a row for each condition to compare You cannot manually define conditions for 2 way tests but you can choose to ignore cer tain levels of the selected parameter by unchecking the appropriate rows The Check All Clear All buttons allow you to check or uncheck all rows 2 Way ANOVA Results From this screen you can do the following Copy to Clipboard Copy the results in this screen to the clipboard You can paste these results into a spreadsheet program or text editor Save Lists Save the results as a gene list or lists e Display in Venn Diagram Display the results in a Venn diagram in the main Gene Spring window Analyzing Data 6 44 Statistical Analysis ANOVA 2 way ANOVA Results 6 18e 5 0 338 0 149
348. n Viewing Data 4 68 Condition Scatter Plot Condition Scatter Plot Display Options The following display options are available e X axis See X Y and Z Axes on page 4 69 e Y axis See X Y and Z Axes on page 4 69 e Z axis See X Y and Z Axes on page 4 69 Lines to Graph See Adding Lines on page 4 69 e Features See Changing Labels and Features on page 4 70 Coloring See Coloring on page 4 70 Error Bars See Error Bars on page 4 28 Legend See Legend Options on page 4 29 X Y and Z Axes The most critical option to set is the type of data that is displayed on the three axes To modify the function as well as the appearance of the axes 1 Click Display Options or right click anywhere in the condition scatter plot window and select D splay Options The Display Options window appears 2 Select the X Axis Y Axis orZ Axis tab 3 Specify the type of data to display on the selected axis from the pull down menu The available options are Gene Expression Profile or Experimental Parameter 4 Specify the gene from which to use data in the plot To choose a gene other than the one selected by default click Choose Gene and use the search screen to locate the desired gene Note You can select genes only from the currently active experiment To work with data from a different experiment you must exit this screen and select that experiment in the main GeneSpr
349. n Mode of Analysis Log Samples and Parameters Similar Conditions Pictures Graph Correlation Experiment Name ime Yeast ce e time series no 80 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeastce e time series no 80 min Yeastce e time series no 90 min Yeastce e time series no 90 min Yeastce e time series no 90 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeast ce e time series no 90 min Yeast rell evele time series fnn an mint ew Cond Close Hem 1016 lt gt gt Figure 4 12 The Similar Conditions tab The Correlation column lists how closely correlated the other conditions in the experiment are to the one under inspection The conditions are listed from most closely correlated to least correlated This feature uses standard correlation to measure the similarity of the selected condition and all others in the experiment This cannot be changed To use another metric you must create a script using the Condition Correlation building block For details on creating scripts see Creating Scripts on page 8 13 The Pictures Tab This tab displays the sample images if any associated with this condition Double click an image to view it at its full size The Graph Tab Displays the condition in graph form The G
350. n 450x450 inches PNG files have a soft limit based on available memory If the estimated amount of memory to produce the PNG file is greater than the memory use setting in your Prefer ences file a warning dialog appears In this case you can attempt to save the PNG any way but if there is not sufficient memory nothing is saved PNG images are automatically saved at your current screen resolution For a higher resolution image save in PICT format Choose an image size from the Page Size pull down menu You have the following options Scale to Fit calculates the best page size in order to display the graphic and all specified labels In some cases this option will specify a page size larger than the maximum In this case you must choose another option Original Image Size lets you save the image exactly as it appears in the genome browser Original Aspect Ratio allows you to change the image size but maintain the original width to height ratio displayed in the genome browser US Letter 8 5 by 11 inches US Legal 8 5 x 14 inches e A4 8 3 x 11 7 inches 3Foot by 5 Foot Poster 3 ft by 5 ft e Custom allows you to save to any size up to 450 inches by 450 inches Choose a margin size If you choose Custom enter the appropriate percentage in the Enter Percentage box Choose a page orientation either landscape or portrait Specify whether to show labels and if so which labels Your options are
351. n a Venn Diagram See Making Lists with the Venn Diagram on page 6 13 for more details Making Lists with the Filter Genes Command Select Tools gt Filtering amp Statistical Analysis It allows you to use expression level constraints and control strength restrictions to create a smaller gene list See The Filtering Menu on page 6 51 for more details Welcome to GeneSpring 1 16 Commonly Used Functions Making Lists from Selected Genes You can make a list of all the genes you have selected in the genome browser by right clicking and choosing Make List from Selected Genes See the Finding and Selecting Genes on page 4 4 for how to select genes See Making Lists from Selected Genes on page 6 14 for more details on this method of making a gene list Making Lists from Conjectured Regulatory Sequences Once you have found possible regulatory sequences using the Find Potential Regulatory Sequences window see Regulatory Sequences on page 6 18 for more details and are inspecting one of the sequences in the Conjectured Regulatory Sequence window you can make a list of all of the genes containing that sequence by selecting List gt Make Gene List See Using the Conjectured Regulatory Sequence Window on page 6 23 for more information Welcome to GeneSpring 1 17 Setting Preferences Setting Preferences The preferences screen allows you to change GeneSpring s global preferences Note that some changes may
352. n s 2 To select all the conditions in an experiment select the experiment in the navigator andclick Choose Condition s 2 To select a pool of conditions manually from any experiments click Add Remove The Conditions to Filter window appears Analyzing Data 6 56 Basic Filters Conditions to Filter BAR E Gy Experiments r Yeast cell cycle time series no 90 min Default Interpretation st cell cycle time serie a Condition Name BID efault Interpretation LG time 0 minutes time 30 minutes Yeast cell cycle tir Default Interpretat Log of ratio 30 minutes L amp E time 10 minutes time 20 minutes Yeast cell cycle tir Default Interpretat Log of ratio 20 minutes GX time 20 minutes E L amp E time 30 minutes m time 150 minutes Yeast cell cycle tir Default Interpretat Log of ratio 150 minute L amp E time 40 minutes B ime 140 minutes Yeast cell cycle tir Default Interpretat Log of ratio 40 min eee ti time 50 minutes L Ime 130 minutes Yeast cell cycle tir Default Interpretat Log of ratio 130 minute L amp E time 60 minutes L time 120 minutes Yeast cell cycle tir Default Interpretat Log of ratio 120 minute v I 6 amp time 70 minutes i time 80 minutes I GX time 100 minutes i Remove All Inspect HGS time 110 minutes eerie lune Ge time 120 minutes d time 130 minutes time 40 minutes Yeast cell cycle tir Default Log of ratio 40 minutes Lael time 140 minutes
353. nd Similar Genes from the GeneSpring main menu In this case you must also click Choose Target Gene and enter a gene name on the Find Target Gene screen For more information on this screen see Performing an Advanced Search on page 4 4 3 Select a gene list from the navigator and click Choose Gene List 4 Select an option from the Similarity Measure menu Available options are Standard Correlation Smooth Correlation Change Correlation Upregulated Correlation Pearson Correlation e Spearman Correlation e Spearman Confidence Two Sided Spearman Confidence 6 8 Analyzing Data 5 8 9 Working with Gene Lists e Distance You can specify minimum and maximum settings for the similarity measure by moving the sliders or entering values in the appropriate fields Select an experiment or condition in the navigator and click Add The New Correlation window appears Correlation Settings Target Gene EFB1 Experiment Yeast cell cycle time series no 90 min Default Interpretation EFB1 S c S o c 2 E o 2 S T c Rz nd e time minutes T T 0 01 T T T T T T T T T T T T T 10 20 30 40 50 60 70 80 100 120 140 160 Weight 1 Weight by Control Strength Phase Offset 0 minutes Parameter time v ok Cancel Hem Figure 6 7 The Correlation Settings window A cumulative distribution graph of gene correlations appears in the center of the New Correlation windo
354. nd select Display Options The Scatter Plot Display Options window appears Click either the Horizontal Axis or Vertical Axis tab In the Display Options navigator select the gene list experiment interpretation or con dition to use on the selected axis Click the Horizontal Vertical Axis Value pull down menu The list of options includes only those that are appropriate for the type of data object you selected Choose a graph mode for the specified axis The three options are linear logarithmic and fold change Note that the fold change option is only available if you are looking at normalized data from an interpretation or a condition To adjust the vertical axes so that all measurements are visible check the Scale Axis to Show all Values box The upper and lower bounds are adjusted auto matically Alternatively you can manually set the upper and lower bounds to values of your choosing To automatically choose tick spacings check the Automatic Tick Spacing on Axis box To set the tick spacings manually leave this box unchecked and enter the major tick interval as well as the number of minor ticks For more information about setting tick spacings see The Vertical Axis on page 4 27 Adding Lines You have the option to draw lines that help distinguish distinct groups of data points Although these lines can represent many types of data thresholds they are generically called fold change lines These fold
355. ng method 7 2 Clustering and Characterizing Data The Clustering Window 4 Enter settings for the clustering operation More information on the settings available for each clustering method is available later in this chapter 5 Specify whether to perform computation locally or on a GeNet Remote Server 6 Click Start If you are running the operation locally its progress is indicated on the progress bar in the Computation Preferences section of the screen When the operation is complete the result depends on the clustering method used See the appropriate section of this chapter for more information Add Remove Experiments Use this window to add or remove experiments from the list to be included in the cluster ing operation Experiments to Cluster Double click to edit weight Experiment ome interpretation mos JH Default Interpretation Yeast cell cycl Default Interpretation Log of ratio time 20 minutes Yeast cell cycle tirr Default Interpretation Log of ratio HG time 10 minutes Patirne 20 minutes G i time 30 minutes HGE time 40 minutes HGE time 50 eti OK Cancel Help Figure 7 2 The Experiments to Cluster window From this window you can do the following To add an experiment select it from the navigator and click Add To remove an experiment select it in the list and click Remove To remove all listed experiments click Remove A11 e To change an experiment s weight double click its
356. ng or complex scripts you may want to create several small scripts and join them together in the ScriptE ditor to create the final script To view details on a script primitive right click it in the navigator and select Proper ties from the pop up menu Boolean Open Scripts gt Basic Scripts gt Boolean Name Input Output Description Boolean No direct input Boolean Generates a true or false result Select yes true or no false when you run the script Boolean AND At least two Bool Boolean Output is true only if both inputs are true eans Boolean None Boolean Returns the result false FALSE 8 20 Scripts and External Programs Script Building Blocks Name Input Output Description Boolean NOT One Boolean Boolean Output is true only if the input is false con verts true to false and false to true Boolean OR Two Booleans Boolean Output is frue if either input is true Boolean None Boolean Returns the result true TRUE Boolean Select Open Scripts gt Basic Scripts gt Boolean gt Select Using Boolean Name Input Output Description Select Boolean Three Booleans Boolean Selects the second boolean input if the first input is true and selects the third boolean input if the first input is false Select Condition One Boolean Condition Selects the first condition if true the sec two conditions ond condition if fa
357. ng window must result in one and only one gene list As a result you do not have the option to run Post Hoc testing when applying 1 way ANOVA in an advanced filter and you must select an individual gene list of interest when applying 2 way ANOVA 2 Setup the desired filtering parameters and click OK In the Advanced Filtering window a line appears representing your filter 3 Use the same procedure to add any additional filters To re order steps select a step in the list and click Move Up orMove Down To insert another instance of a step select it in the list and click Duplicate To remove a step click Delete At any time you can view and edit an individual filtering step by double clicking it in the Restrictions table or highlighting it and selecting Edit Analyzing Data 6 66 Advanced Filtering 4 Choose from the pull down menus in the column headers to construct the desired Bool ean expression For more information on constructing Boolean expressions see Cre ating Boolean Filters on page 6 67 5 Specify whether to run the script locally or on a GeNet Remote Execution Server If you specify Remote Execution the preview pane in the Filtering window for each step is disabled This is so that GeneSpring does not try to calculate the filter in real time while you are creating it Note An advanced filter using the Arbitrary File Restriction filter cannot be executed remotely 6 Click Start Once you have creat
358. ng with Gene Lists Pathways A pathway is a graphical representation of the interaction between gene products in a bio logical system Genes can be superimposed on the pathway allowing you to view their expression levels in a biological context You can zoom in on a pathway and move the slider to watch gene expression change over the experimental conditions You can draw pathways yourself or use publicly available pathways such as KEGG Kyoto Encyclopedia of Genes and Genomes One scenario in which a pathway can be very useful is if you are trying to identify a class of genes that are associated with a partic ular step or regulatory element within a pathway Selected Pathway mitosis Colored by Yeast cell cycle time series no 90 min Default Interpreta Gene List all genomic elements 7216 Figure 6 10 A cell cycle pathway KEGG Kyoto Encyclopedia of Genes and Genomes is a large public database containing pathways for many organisms all of which are available for download via FTP To locate pathways for a specific organism go to the following URL ftp ftp genome ad jp pub kegg pathways Folders are named using three letter abbreviations based on the Latin name of the organ ism i e hsa is Homo sapiens mmu is Mus musculus To locate generic pathways download the maps folder This folder contains reference pathways metabolic and regulatory These pathways contain enzymes rather than genes When you im
359. nges to a bull s eye Click on a GeneSpring window to save the image as a file on your hard drive called Picture You must rename this file otherwise it is overwritten each time you repeat this procedure To Save the Entire Screen Windows Press the Print Screen key to save an image of your entire computer screen Paste the image into any program that accepts graphics and save it Macintosh Press 2 Shift 3 simultaneously to save an image of your entire com puter screen The image is saved as a file on your hard drive called Picture Exporting GeneSpring Data 9 3 Saving Pictures and Printing Saving Pictures and Printing You can print an image of the genome browser the genome browser with the colorbar or the display window Such images can be useful for reports or handouts Use a high resolu tion color printer to print GeneSpring images Printing the Genome Browser and or Colorbar 1 Select the File gt Print Image command 2 Choose from the following options Browser Browser and Colorbar e Colorbar 3 Select a printer and click OK Printing the Display Window Windows 1 Hold the Alt and Print Screen keys down simultaneously This copies a picture of the active window only 2 Paste into any program that accepts graphics 3 Print Macintosh 1 Hold the Command Shift 4 Caps Lock keys down simultaneously The cursor changes to a bull s eye 2 Release the keys and use the mouse to click on
360. ns Window Help Gene Lists H Experiments HIT Giant Rat Really Giant Rat est HC Gene Trees HC Condition Trees H Classifications z SS Classification 1 0 11 13 14 15 Ue Classification 2 HA Pathways 0 01 HCJ Array Layouts 0 11 131415 HO Expression Profiles 1aNormalized Intensity HOJ External Programs NU 4 HC Bookmarks 0 7 H Scripts Classification 1 pa T T T 11 13 14 15 18 Y axis test Default Interpretation Colored by Classification 1 Splitby Classification 1 Gene List all genes 1233 2 Magnification 1 Figure 4 19 A Split Window colored by Classification You can also color by classification using the Display Options window 1 Select View gt Display Options and click the Coloring tab 2 Select a classification from the navigator on the left side of the Display Options win dow 3 Click Set Experiment 4 Click OK 4 34 Viewing Data Display Options Split Window and Color by Classification You can also use the Split Window feature with the Color by Classification scheme 1 Select a gene list to view 2 Right click over a folder or a previously saved classification and select Use as Classification 3 Right click over that folder again and select Split Window gt Both Color by Secondary Experiment The Graph and Scatter Plot displays lend themselves to b
361. nspector window contains the experiment informa tion You can edit the text in the white boxes as desired Below this are five tabs Click a tab to view its contents There are two buttons at the bottom of the window regardless of which tab is active Click the OK button to save your data and exit the window Click Cancel to close the Experi ment Inspector without saving any changes The Parameters Tab On the Parameters tab you can view the experiment parameters and their possible values Click Edit Parameters to view the Change Parameters window See Experiment Parameters on page 3 29 for details on this window When you click OK any changes you make are saved and applied to your experiment To view details on a particular sample select it in the list and click Inspect Sample This invokes the Sample Inspector window For more information about the Sample Inspector see The Sample Inspector on page 4 13 The Interpretations Tab This tab allows you to view all the interpretations associated with the selected experiment Click on an interpretation in the list to select it To edit an interpretation double click it or select it in the list and click Edit Interpretation The Change Interpretation win Viewing Data 4 17 Inspectors dow appears When you click OK any changes you make are saved and applied to your experiment The Normalizations Tab This tab allows you to view the normalizations currently being used in your ex
362. nstalling from a Database Appendix B Equations for Correlations and other Similarity Measures B 1 Equations for Correlations and other Similarity Measures Measures of Similarity Measures of Similarity Many advanced analysis techniques are based on measures of gene similarity Similarity or nearness between genes is usually based on the correlation between the expression profiles of the two genes GeneSpring offers nine choices of similarity measures Each can be selected from a pull down list appearing in the Clustering and Filtering windows See Clustering and Char acterizing Data and The Filtering Menu on page 51 respectively Each measure takes two expression patterns and produces a number representing how sim ilar the two genes are Most of the measures of similarity are correlation measures and their value varies from 1 exactly opposite to 1 the same For a measure of distance the result varies from 0 the same to infinity different For confidences the result varies from 0 no confidence to 1 perfect confidence Both distance and confidence are actu ally measures of dissimilarity small means close and large means far away These are each transformed to measures of similarity by GeneSpring in ways detailed below If one expression value for a particular sample for either gene is missing that sample is not considered in the calculation The notation used to describe the formulas Result
363. nto the Inputs area of the browser Script output The final output of a script Script outputs are created when you drag a building block output into the Outputs area of the browser Knob A value entered when running the script The ScriptEditor Interface The ScriptEditor s main screen contains three panels the navigator browser and notes Scripts and External Programs 8 7 Using the ScriptEditor amp ScriptEditor New Script File Edit Help E Scripts This Script HHO Basic Scripts Inputs Name New Script EH Examples pp Label GHO Example Scripts SAM Notes EHD External Programs LB 2 fold Expression Change B Best k means 8 Clustering 2 fold Change amp Filter on Noise LB Find Similar Genes E Make Gene List from Text LB Pairwise Comparison B Probe Entire Enterprise R B Select k means S Send Clustering Results 1 S Series of k means fincrer Outputs Note This sqript produces no results The ScriptEditor screen s navigator browser and block or notes sections Figure 8 1 The ScriptEditor workspace The ScriptEditor Navigator The navigator contains the building blocks building blocks scripts external program building blocks you use to create your scripts Click the plus sign next to a folder to open or close it 8 8 Scripts and External Programs Using the ScriptEditor The ScriptEditor Browser Inputs amp oene List Gere list Experiment inpu
364. ntrol Signal A value calculated from all the normalizations applied to the experi ment This value is used as the denominator to calculate normalized values Analyzing Data 6 53 Basic Filters Basic Filters Filter on Expression Level Filter on Expression Level DER FC Gene Lists C3 Ec Select an experiment interpretation or condition and click on Choose Experiment HOJ PCA Yeast cell cycle time HA PIR keywords Choose Gene List gt gt like YMR199W CLN1 0 95 HO Simplified Gene Ontology X all genes Choose Experiment gt gt Yeast cell cycle time series no 90 min Defau Exclude Conditions X all genomic elements Choose Data Type Normalized Data v Filter Genes on Normalized Data 40 117 out of 117 genes pass filter 10 os All Samples Normalized Data Normalized Intensity log scale 0 01 Maximum 20 View Graph Values must appear in atleast 8 out of 16 conditions E 3 3 Save Close Help Figure 6 11 The Filter on Expression Level screen This filter finds genes with certain values present in some of the conditions or samples in an experiment or interpretation You can set what proportion of conditions must meet a certain threshold For example to eliminate genes that do not meet a specified control value at least once in the experiment you can filter them out by sett
365. nuous parameter data points are graphed in histo grams as discrete points A gene deletion is a simple example of a non continuous ele ment but it is by no means the only possible non continuous parameter A non continuous parameter is occasionally referred to as a set when there are other parameter display options employed especially when a continuous parameter is used because the non con tinuous parameter separates the data into a series of discrete graphs viewed next to each other on the same screen When a continuous parameter is used in conjunction with a non continuous parameter each discrete graph contains all of the values of the continuous parameter making each of the separate graphs look like a set of parameter values Color Code A color code is used for experimental parameters whose parameter values exist indepen dently of one another but are not unrelated When the genome browser is colored by parameter GeneSpring orders the parameter values from top to bottom in the colorbar See Color by Parameter on page 4 33 for details Values are listed in alphabetic or numerical order Working With Experiments 3 31 Experiment Parameters Each color represents a category or set of categories When coloring the browser display by parameter each value defined as a condition is assigned a color and every data point described by that parameter is drawn in that parameter s color This can be referred to as Color by Parameter Th
366. nvalue decomposition of the covariance matrix of the data The eigenvalue corre sponding to an eigenvector represents the amount of variability explained by that eigen vector The eigenvector of the largest eigenvalue is the first principal component The eigenvector of the second largest eigenvalue is the second principal component and so on Principal components which explain significant variability are displayed by GeneSpring in the Principal Components Analysis window There are never more principal components than there are conditions in the data Viewing Principal Component Loadings in a Scatter Plot After performing principal components analysis the genome browser displays a 3 D scat ter plot in which the loadings for the first second and third principal components repre senting the largest fraction of the overall variability are plotted on the X Y and Z axes respectively In Figure 7 15 each point represents a single gene Its position on the Y axis represents the loading of principal component 2 The position on the X axis represents the loading of principal component 1 This view is useful for selecting and making lists of genes that exhibit high levels of one or two principal components Genes that exhibit high levels of the first principal component and low levels of the second principal component are displayed in the lower right corner of the plot and genes exhibiting equal levels of the two components lie along the diagonal
367. ny professional level program it can be intimidating to new users The following section is a brief introduction to using Gene Spring and loading data designed to get you up and running in the shortest possible time Figure 1 2 depicts the steps in a typical analysis session using GeneSpring Note that this diagram represents what might occur in a typical data analysis session and does not include all of the types of analyses found in GeneSpring export data and or images for use in publication or target validation publish to retrieve from GeNet Figure 1 2 Typical GeneSpring workflow In the process of loading your data you will come across terms and concepts such as genome parameter parameter values replicate interpreted data etc Below are explana tions of how these terms are used in GeneSpring What is a Genome In the context of GeneSpring a genome contains information about all the genes in your chip or microarray setup Note that a GeneSpring genome does not correspond exactly to 1 6 Welcome to GeneSpring GeneSpring Basics the biological definition of a genome A genome in GeneSpring is composed of discrete genes as opposed to the full nucleotide sequence This means that a GeneSpring genome can contain two genes representing alternately spliced variants of a single gene whereas a true genome would include the DNA sequences for only one What is a Parameter Parameters are experiment variables such as stage ti
368. o allows you to change the appearance of data points and data labels To modify these features 1 Select View gt Display Options or right click anywhere in the genome browser and select Display Options 2 Click the Features tab 3 To modify the size and shape of the points choose from among the options in the Style and Size pull down menus 4 There are five options for labeling the plot Show Gene Names Displays the name of each gene to the lower right of each point These names become unreadable if more than 100 genes are visible in the current gene list and magnification Show Horizontal Axis Label Displays the parameter that is graphed on the hori zontal axis Show Vertical Axis Label Displays the parameter that is graphed on the vertical axis Label Vertical Axis on Side Displays the vertical axis label vertically If this is unchecked the vertical axis label sits to the right of the top of the vertical axis Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser Coloring Coloring in the scatter plot view is more complicated than in other views because the color of each gene can be derived from the data in either axis In other views the color of the gene is usually linked to the data plotted on the vertical axis In addition the scatter plot allows you to color gen
369. o download data from GeNet Open Scripts gt Basic Scripts gt GeNet gt GeNet Downloading gt Default Directory Name Inputs Knobs Description Download a Gene List None Gene List Outputs a specified gene list from GeNet name retrieved from GeNet Download an Experiment None Experiment Outputs a specific experiment from GeNet Name retrieved from GeNet Download all Gene Lists None None Outputs a group of gene lists from GeNet retrieved from GeNet 8 22 Scripts and External Programs Script Building Blocks Name Inputs Knobs Description from GeNet Download All Experiments None None Outputs a group of experiment interpretations retrieved from GeNet GeNet Downloading Specified Directory Note that you will need to login to GeNet before using a script to download data from GeNet Open Scripts gt Basic Scripts gt GeNet gt GeNet Downloading gt Specified Directory Name Inputs Knobs Description Download All Gene Lists None Directory Outputs an array of gene lists from Directory in GeNet Name retrieved from the specified folder in GeNet Download All Experiments None Directory Outputs an experiment or array of from Directory in GeNet Name experiments retrieved from the specified folder in GeNet This allows you to hard code a reference to an experiment or folder of experi ments in your script GeNet P
370. o enter a custom delimiter uncheck the Use ASCII 255 as delimiter box and enter the desired delimiter in the Use custom delimiter string text box This may be a character or a string Some external programs may look for ASCII 255 to indicate that the data has finished being sent If this is the case check the Terminate Last Input to Program with ASCII 255 box The Arguments Tab Command line arguments are a way of providing extra information to the external pro gram For example if the external program can perform one of three clustering methods a command line argument might tell the external program which clustering method to use This tab is optional since only some external programs require command line arguments Inputs Outputs Delimiters gt User Command Line Arguments Add Argument Argument Name Default Value No arguments specified Remove Argument Delimiter between argument name and value equals x Fillin missing argument values with Figure 8 7 The Arguments tab This tab contains a table of name value pairs displaying the name of the argument and its default value Scripts and External Programs 8 39 External Programs To enter a new argument 1 Click Add Argument In the table to the right a new line appears 2 Replace the text name1 and valuel with the appropriate argument name and value Le v and all Some arguments may not have values In this case enter only
371. o have GeneSpring invoke the Bulk Upload to GeNet window when you quit Gene Spring check the Remind me to upload new data to GeNet box Welcome to GeneSpring 1 22 Setting Preferences When you create a new experiment in GeneSpring using samples from a GeNet server GeneSpring saves local copies of those samples by default This can cause slow perfor mance when saving large experiments To disable this feature uncheck the Save GeNet Samples Locally When Creating an Experiment box To enter a new GeNet server click New and enter the following information GeNet Server Name The name of the server to connect to GeNet Server Address The IP address of the GeNet server Enter the numeric address only i e 127 0 0 1 Do not enter http before this address Default Username The username with which to connect to GeNet GeNet amp GeneSpring on Specify whether the GeNet server is on the same side of your firewall as GeneSpring or not Use Secure Connection Specify whether communication between GeneSpring and GeNet should be secured or not If it is secured communication between GeNet and GeneSpring uses HTTPS which uses the SSL library available in Java To edit an existing GeNet server select it from the list and click Edit To delete a GeNet Server select it from the list and click Delete Computation On this tab specify settings for how to run scripts Default Computation Select Local to
372. of Interest yeast timeseries t Itizy 1032 Type Your Name Jun 19 2001 10 2Silicon Genetics 5 EHA Experiments yeasttimeseries tltizy 1031 Type Your Name Jun 19 2001 10 2Silicon Genetics ell cycle time series no 9 yeast timeseries t Itizy 1030 Type Your Name Jun 19 2001 10 2Silicon Genetics yeast timeseries t Itizy 1029 Type Your Name Jun 19 2001 10 2Silicon Genetics yeasttimeseries t Itizy 1028 Type Your Name Jun 19 2001 10 2 Silicon Genetics Ja Filter on Parameter Show All Filter on Experiment Filter on Keyword Add All Selected Samples 0 Samples Close Help Figure 3 15 The Filter on Experiment tab Use samples stored Locally To filter by experiment use the GeneSpring navigator to locate the desired experiment and select it in the list All samples associated with that experiment appear in the Filter Results list Filter on Parameter This method filters samples based on a parameter and value range Parameters are associ ated with experiments not individual samples The samples listed are those contained in any experiment matching the selected parameter and value s Working With Experiments 3 25 Sample Manager Filter on Experiment Filter on Keyword Filter on Attributes Filter on Parameter Show All Select a Sample Parameter of Interest r Select Parameter Values Select All Clear All Use samples stored Locally S Filter Resu
373. olumn titles To set up columns 1 Assign functions to each data column Choose a function from the pull down menu in each column See Figure 3 9 for an example You must designate at least one Gene Name column and one Signal raw data column before the Load Now button becomes active Working With Experiments 3 9 Using the Column Editor The available column assignments are listed below Name Required Allowed Description Unused Optional Any These columns are not visible within Gene Spring but can be used to filter data via the Filter Genes window See Filter on Data File on page 6 61 for details Gene Identifier Required One Gene identifiers must be unique to the genes in this genome Duplicate genes are treated as replicates It is recommended that the Gene Identifier in the raw data files be the gene s Systematic Name Signal Required One or more You must have at least one Signal column Signal Background Optional Any You can have as many Signal Background columns as you have Signal columns If you are using Signal Background you must have a Signal Background for each Signal column Signal Precision Optional Any Used only when the scanner software used for your experiment produces an estimate of the precision of the value in the signal column This information is merged with other information as part of the GeneSpring Cross gene
374. olumn titles directly above the expression data select this row using the controls in the Column Titles panel Step 3 If your file has a Flags column enter the values that will appear in that column into the Flag Values panel below Step 4 If you might be loading files of this format in the future click Remember this Format This option is not available for formats with multiple signal columns Line 3 ignored ORFs a s Line 6 Column Titles i d Si isi ipti Phenotype Function Keywords PubMedID Line 7 data ctri Chni Bkgd immediate early g Line 8 data M55250 g cription inhibitory glycine r inhibitory glycine r inhibitory glycine r Line 9 data U19866 activity regulated c growth factor Line 10 data U04740 Sprague Dawley p platelet activating Line 11 data U29873 N methyl D aspar NMDAR L 1140064 nornyicamoa nralifi DDAD dalta nratai Flag Values gsing Column Titles Present Fla F Clear All Settings o Has Column Titles z z Eg i ini i Absent Flag A bere ae al Marginal Flag il Previous ext Cancel Help Function pull down menu Flag Translation Table Advanced Options Figure 3 9 The Column Editor When you first load a file GeneSpring analyzes it to determine which row contains the column titles If the row chosen is incorrect use the Line Containing Column Titles field to adjust the number of rows If there are no column titles in your data file uncheck the box marked Has c
375. omated Experiment Parameters for Really Huge Rat DER h Please select values for experimental parameters Warning Modifying parameters may invalidate existing experiment trees built from this experiment File Name Diseased Norma Import Parameter Parameter Units no Logarithmic no Replace Text 1 Mergen Rat 1 ORFs txt Mergen diseased Extract Subvalues Set Value Order Save Save As Cancel Help Figure 3 20 The Experiment Parameters window Import Parameter You can import a parameter from another experiment or from a list of sample attributes defined in any of the samples in the current experiment To import a parameter 1 Click Import Parameter The Import Parameters window appears Working With Experiments 3 32 Experiment Parameters 2 Select the parameter or parameters to import To select a sample attribute to import as a parameter click its name in the list at the top of the screen To select all attributes in this list click Select A11 To clear your selections click Clear All To import parameters from another experiment find the desired experiment in the nav igator and select it The parameters associated with that experiment appear in the Parameters from Selected Experiment list Select the desired parameters from the list 3 When you are done click OK You are returned to the Experiment Parameters screen A new column appears for each parameter you imported New
376. on D E 1984 Analysis of Messy Data Volume 1 Designed Experiments Wadsworth Inc Belmont California Box GE P Hunter W G and Hunter J S 1978 Statistics for Experimenters John Wiley and Sons New York Satterthwaite F E 1946 An approximate distribution of estimates of variance compo nents Biometrics Bulletin 2 110 14 Working With Experiments 3 47 Cross gene Error Models Working With Experiments 3 48 4 Viewing Data Viewing Data 4 1 Using the Genome Browser Using the Genome Browser The large panel in the center of the GeneSpring window is the genome browser which graphically displays information about the genes in the selected gene list The genome browser often presents so much information that individual genes and gene names are not visible To look more closely at fewer genes you can zoom in and pan around Zooming In You can enlarge a region of the screen by zooming in Click and drag a rectangle across the region to enlarge Release the cursor Repeat steps 1 and 2 until you reach the desired magnification level To undo a zoom type Ctr1 or click Zoom Out Figure 4 1 Zooming To return directly to the unmagnified state do one of the following Selectthe View Zoom Fully Out option Click Zoom Fully Out TypeCtrl Home Panning If you have zoomed in and must view genes that are not visible in the genome browser but are nearby you can pan in any
377. on Profiles External Programs HC Bookmarks HC Scripts Expression time 70 Il time 60 minu time 80 minu time 100 min time 110 min Selected Gene Tree Yeastcellcycl Colored by Yeastcell cycle time series Selected Condition Tree Yeastcellcycl Gene List all genomic elements 7216 Zoom Out Zoom Fully Out Horiz mag 2 1 Vert mag 3 Figure 4 29 Eisen like Tree View In GeneSpring 6 0 subtrees can be viewed in an Eisen like format There are two ways to select a subtree for this view In GeNetViewer subtrees can be viewed in an Eisen like format There are two ways to select a subtree for this view Double click on the node defining the desired subtree Right click on the node defining the desired subtree and select Display Sub tree Double clicking a node changes the selected subtree You can double click on nodes both in the thumbnail and in the main part of the window Three marquees may be shown on the thumbnail Viewing Data 4 53 Tree View Figure 4 30 Marquees in the Tree View The area displayed within all of these marquees is displayed to the right of the thumbnail To change the size of the thumbnail use the drag arrows below and to the right of the tree To enable these drag arrows 1 Fight click in the browser and select Display Options 2 Click the Features tab 3 4 Check the Display Drag Arrows box Cl
378. on these filters see Filtering Methods on page 3 25 The right portion of the screen contains two sample lists The upper list contains all of the samples resulting from the current filtering method The lower list contains the samples you have selected Working With Experiments 3 23 The Sample Manager Filter Results 16 Samples Research Group yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics i 1 Remove All Configure Columns Selected Samples 2 Samples author Creation pate upload pate Research Group m yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics E yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics Publish to GeNet c m GeNe Delete Create Experiment Edit Attributes Figure 3 14 Sample Lists Below the Filter Results list are six buttons Add Add a selected sample in the Filter Results list to the Selected Samples list Add AII A dd all samples in the Filter Results list to the Selected Samples list Remove Remove a selected sample from the Selected Samples list Re
379. onfidence value if there is a high Spearman correlation and a low p value meaning there is a low probability to find a correlation this high This measure is very similar to looking for large Spearman B 4 Equations for Correlations and other Similarity Measures Common Correlations correlation values but it takes into account the number of sub experiments in your experi ment set To compute a Spearman confidence If r is the value of the Spearman correlation as described in Spearman Correlation on page 4 then Spearman confidence 1 probability you would get a value of r or higher by chance Two Sided Spearman Confidence Two sided Spearman confidence is again a measure of similarity but not a correlation It is very similar to the Spearman confidence discussed in Spearman Confidence on page 4 except it is based on the two sided test of whether the Spearman correlation is either sig nificantly greater than zero or significantly lower than zero There is a high Two sided Spearman confidence value if the absolute value of the Spearman correlation is large and has a small p value meaning there is a low probability to find a correlation with absolute value this large This similarity measure is really good for answering the question What genes behave similarly to a specific gene and at the same time what genes behave opposite to a specific gene It should probably not be used for the advanced clustering algorith
380. ons are Experiment Inspector Opens the Experiment Inspector window For a detailed description of this window see The Experiment Inspector on page 4 16 Experiment Parameters Opens the Experiment Parameters window For a detailed description of this window see Changing Experiment Parameters on page 3 32 Error Model Structure Opens the Cross gene Error Model window For a detailed description of this window see Cross gene Error Models on page 3 44 7 When you are done name your interpretation and click Save to update your current interpretation or Save As to create a new interpretation Find saved interpretations by clicking on the relevant experiment in the Experiments folder of the navigator You can delete an interpretation you have created by right clicking over it in the navigator and selecting Delete from the pop up menu Vertical Axis Modes The default display is Ratio where normalized intensity values are graphed on the vertical axis In this mode values range from negative infinity to infinity Working With Experiments 3 40 Experiment Interpretations 5 0 Normalized Yeast cell cycle time series no 80 min Intensity ratio Figure 3 25 The gene list like CLNI graphed using the signal control formula The Y axis is graphed from 0 to 5 The Ratio is determined by dividing the signal raw data by the control strength In a one color experiment the control strength refers to the deno
381. or a Building Block Creating Scripts Scripts can be created out of the most basic scripting elements known as building blocks as well as out of other scripts You can combine many simple scripts to create more com plex scripts Building Blocks Three types of data objects can be used as building blocks in the creation of scripts scripts script building blocks external program building blocks Within a script data is passed from one building block to another as the script runs To cre ate a new script drag one or more building blocks from the navigator to the browser then drag the cursor from the input socket of one building block to the matching output socket of another building block or vice versa A line appears between the two sockets For detailed descriptions of the available building blocks see Script Building Blocks on page 8 20 Inputs and Outputs There are two types of inputs and outputs Building block inputs and outputs Building block inputs and outputs are used to enter information into and retrieve informa tion from a script Building block inputs and outputs are found at the top and bottom ofa given building block respectively Script inputs and outputs Scripts and External Programs 8 13 Using the ScriptEditor Script inputs are created when you drag a building block input into the Inputs area of the browser Script outputs are created when you drag a building block output into th
382. orizontalDuplication optional rarely used When dots are duplicated horizon tally the number of copies CommonArrayType The format of the array e Q X Y The data file contains two columns The first is a list of genes the second is a set of three numbers separated by commas or hyphens The first is the sub array number the second is the X coordinate and the third is the Y coordinate All numbers start counting from 1 The subarrays are counted left to right top to bot tom The second column can optionally be enclosed in quotation marks e Q R C Same as Q X Y except the X and Y coordinates are swapped CLONTECH LNL There is no datafile All genes have systematic names of the form B4c indicating where they are in the array The first capital letter indicates which sub array the number indicated which column and the lower case letter indicates which row CLONTECH LNNL Same as LNL except there are two digits instead of one DataFileName The name of a datafile linking locations with gene names in format given by the CommonArrayType choice Once you have created the layout file save it in the ArrayLayouts folder of the genome folder for which the layout pertains 2 12 Creating Genomes Layout Parameters For example if you have not changed the defaults set up of GeneSpring the path to the layout folder in the yeast genome is C Program Files SiliconGenet ics GeneSpring data Demo Chip
383. ory browser in the lower left portion of the screen The selected folder appears in the Folder field To save in a new subfolder navigate to the desired parent folder and enter a name for the new folder in the Folder field Program Executable The type of program External program Java class or HTTP link Command Line The complete pathname required to run the program i e c Program Files Perl ScriptCompletionHaiku pl orhttp ecoli sample com analysis cgi e Inputs Data input for the external program For more information see The Inputs Tab on page 8 36 e Outputs Output of the external program For more information see The Outputs Tab on page 8 37 e Delimiters Delimiters to separate multiple outputs For more information see The Delimiters Tab on page 8 39 Arguments Necessary command line arguments for the external program For more information see The Arguments Tab on page 8 39 The Inputs Tab The input is what GeneSpring sends to the external program On this tab specify any nec essary input data for the external program You can add as many inputs as you like To add a program input 1 Click Add Input The Choose Type of Input screen appears amp Choose Type of Input to External Program DER Choose Input Type Values Included in Input Gene List Normalized Values Control Values Raw Values C Gene List With Numbers Average Average Average C Gene Name
384. ost situations Post Hoc Tests Performing a 1 way ANOVA results in a list of genes with p values below the specified cutoff These are the genes with significant differential expression across the specified groups If testing has been performed across more than 2 groups post hoc tests can be used to identify the specific groups in which significant differential expression occurs To perform post hoc tests specify which post hoc test to use in the post hoc test pull down menu The available choices are Tukey and Student Newman Keuls Pairwise compari sons between all groups are performed group comparisons resulting in a p value below the specified cutoff are displayed in the output Post hoc tests can only be performed if more than 2 groups are defined by the chosen testing parameter or if more than 2 groups are chosen manually Tukey Suppose SGC with more than 2 groups has been performed ANOVA or Kruskal Wallis and that some genes have passed the cutoff The following calculations are done separately for each gene Let X X5 X be the ordered group means ascending order We perform pairwise group mean comparisons in the following manner kvs l kvs 2 k vs k l thenk 1 vs 1 k 1 vs 2 k 1 vs k 2 ending with 2 vs l Do not perform unnecessary tests i e if there is no significant difference between a pair do not test any closer pairs Each test is performed as described below Parametric Tests
385. ot Table e AtlasImage GenePix Results Imagene ScanArray QuantArray Important Affymetrix Pivot files with more than one sample per file are not supported If you are retrieving data from a directory containing data in multiple formats any files that do not match the format you specify here will be ignored If data are being retrieved from a database as a set of columns using a SOL query a known format type may not be used and must be explicitly defined using the format described below If your data are not in a standard format you must define the format using the available tags Columns can be specified either as a number first column 1 or header If columns are specified by the header and data is retrieved from a database using a SOL query make sure the headers retrieved in the SOL query exactly match the headers specified here It is a good idea to write your SQL queries as follows select columnl as columni from table name where If a column is not used you can omit the line or enter 1 for strings Contents lt GeneColumn gt lt Headlines gt lt SignalColumn gt NormalizedCol umn gt lt ReferenceColumn gt lt SignalBackgroundColumn gt ReferenceBack groundColumn gt lt ExperimentWorkedColumn gt lt ExperimentWorkedDesignation gt lt ExperimentAbsentDesignation gt lt ExperimentMarginalDesignation gt lt RegionColumn gt lt TreatNoSignalAsIn va
386. other genes on the pathway GeneSpring examines all the genes on your currently selected gene list and finds all genes whose minimum similarity correlation with genes on list A is higher than their maximum similarity with genes on list B These genes are made into a separate list for you to exam ine You can place a gene from this list on the pathway see Adding a Gene to a Pathway on page 6 17 If your pathway geometry is complex this procedure is not very useful since it relies on screen distance only not pathway structure or connectivity To find new genes on a pathway 1 Right click near a group of genes displayed on your pathway 2 Choose the option Find Genes Which Could Fit Here The New Gene List window appears 3 Enter a name and destination folder and click Save The new gene list is saved in your Gene Lists folder Pathway Commands Right click your Pathway in the navigator for the following options Display Pathway Displays the selected pathway in the genome browser Make Gene List Allows you to save a list of all the genes on the selected pathway Attachments Allows you to add a text or picture attachment to your Pathway Inspect Displays a listing of details such as pathway history and genome Publish to GeNet Uploads your information and the pathway picture to GeNet see Publishing Data to GeNet on page 9 11 Rename Pathway Allows you to rename your pathway Delete Pathway
387. ou can also right click the genome browser in graph view and select Options gt Experiment Interpretation 2 From the Mode pull down menu choose a data display mode for the vertical axis You have the following choices e Ratio signal control Working With Experiments 3 39 Experiment Interpretations Log of ratio Fold Change The mode you choose is used in such statistical procedures as Statistical Group Com parison k means Clustering Self organizing Maps and Principal Components Analy sis See below for details on these modes Choose the lower and upper bounds of the vertical axis in the fields provided 3 Depending on your instrumentation you may have flags indicating the degree to which your data is reliable If you have flags choose from the Use Measurements Flagged pull down menu to limit data based on these flags 4 optional To use the cross gene error model check the Use Cross Gene Error Model box Cross gene error models if used are assigned an equal number of degrees of freedom as the direct variability estimates for that gene 5 Choose a mode for each parameter Continuous Element Non continuous Color Code orDo Not Display Note that if you choose Color Code you must also select Colorbar Color by Parameter See below for details on these modes 6 To make any additional desired changes in your experiment before you continue click any of the buttons in the Experiment Properties panel Your opti
388. ows or hides the navigator panel Hide All Hides everything in the window except the genome browser Show All Shows all elements Hide All in All Windows Hides everything in all windows except the genome browser Show All in All Windows Shows all elements in all windows Viewing Data 4 71 Showing Hiding Window Display Elements Viewing Data 4 72 5 Normalizing Data Normalizing Data 5 1 Experiment Normalizations Experiment Normalizations Experiment normalizations are used to standardize your microarray data to enable differ entiation between real biological variations in gene expression levels and variations due to the measurement process Normalizing also scales your data so that you can compare relative gene expression levels GeneSpring assumes the data you have entered is raw data and must be normalized If your data has been pre normalized around a median other than 1 it may not be accurately interpreted during analysis If your data are pre normalized this way see Normalize to a Constant Value on page 5 12 There are several ways to normalize your data in GeneSpring Typically you will want to do either one per chip normalization together with one per gene normalization or one per spot normalization with one per chip normalization There are important exceptions to this which are discussed below under the relevant normalization Most normalizations can be applied in any order and different samples in
389. ox that appears To replace all instances of an entry choose Replace and then uncheck the Replace in selected cells only box before clicking OK Working With Experiments 3 33 Experiment Parameters Extract Sub Values This feature automates parameter assignment To use it create file names based on your parameter values e g Rlr001a txt where Rlr0 is an experiment and 01 is your sam ple number and a is the region designator When you use the Extract Sub values feature file names are broken down into sub values GeneSpring is programmed to first look for alternating constant fields and variable fields and to make parameters out of the variable fields Next it divides the variable fields into groups consisting of uninterrupted stretches of either numbers letters or non alpha numeric characters and makes parameters out of each of these groups Fill Down To replace entries using the top selected cell click on the cell you want to use as the replacement and then holding down the Shift key click on the cells underneath whose values you would like replaced with the original cell Then click F311 Down Fill Sequence Down This allows you to fill down as described above but automatically continue a simple numeric or alphabetic sequence Set Value Order To change the order of your parameters as they are displayed along the X axis in the main GeneSpring window select an entire column or part of a column and clic
390. p is available only for two color experiments This step must be the first normalization step that changes the units of signal and control For example a log transform can come before this step but not a per chip normalization This is because the Signal and Control must be in the same units Per Spot Normalization Per Spot normalizations are commonly used for two color experiments The formula for this normalization is signal strength of gene A in sample X control channel value for gene A in sample X Divide by control channel This option divides the measured intensity of each gene by the value of its Control chan nel This is recommended for two color experiments if you do not use intensity dependent normalization If the Control channel value is very low a cutoff value is used instead By default this value is 10 0 To change this value click in the Cutoff text box and enter the desired cutoff value Note The cutoff value cannot be lower than 0 000001 This normalization works as follows Signal gt Cutoff Signal lt Cutoff Control gt Cutoff Signal Control Signal Control Control lt Cutoff Signal Cutoff No Data Normalizing Data 5 7 Normalization Types Reserve Control Channel This option was previously known as Use Control Channel for Trust This option tells GeneSpring to use the control channel to determine the saturation of the color of your genes This is recommended when
391. parameters in an experiment then histograms are shown instead of line graphs A typical example of a continuous parameter is time or drug concentration Continuous parameters can optionally be made logarithmic for display purposes Non continuous Parameter is a possibly numerical parameter for which drawing lines between points does not make sense but you still wish to graph it along the horizontal axis Typical examples of such parameters are drug type strain of the organism under study or tissue type GeneSpring will typically display smaller graphs side by side in the genome browser This may also be referred to as discrete Replicate is not interpreted by GeneSpring Instead it is considered a tracking identifier Sub experiments that have all parameters other than the Replicate parameter the same are considered repeats These are visually represented on graphs by taking the median of the data values and plotting error bars Typical examples of such parameters are database identifiers and individual organism names Pop up Menu A list of options that appears from a sub menu or by right clicking Option click for Macintosh Replicate Can be multiple spots on the same array representing the same gene also referred to as a copy the same sample in more than one array ora biological replicate that is equivalent samples taken from more than one organism A parameter defined as a replicate is graphically a hidden var
392. pearman Confidence Distance For detailed information on these measures of similarity see Similarity Definitions on page 7 4 Do automatic annotation Specifies whether to annotate the nodes of the tree with the names of the gene lists that have similar members Using this option can add con siderable time to the tree building process but is usually worthwhile This feature becomes even more valuable once you have created a simplified ontology for the genome as the ontological classifications can be used to label tree branches For details on creating a simplified ontology see Building a Simplified Ontology on page 6 31 Only annotate with standard lists Specifies whether the annotations on the nodes are done with all gene lists or only the gene lists marked as standard This is set in the Gene List Inspector See Displaying a Gene List on page 4 3 for more information Discard genes with no data in half the starting conditions Discard any genes with no data in at least half the conditions in the selected experiment Clustering and Characterizing Data 7 5 Clustering Methods Merge similar branches Merge branches with similar results For information on the Separation Ratio and Minimum Distance settings see Advanced Tree Options on page 7 8 Saving Gene Tree Clustering Results When the gene tree clustering operation has completed the Name New Gene Tree window appears Name New Gene Tree
393. periment To edit normalizations click Edit Normalizations The Experiment Normaliza tions window appears See Experiment Normalizations on page 5 2 for details on this window Click OK to save your changes To view a more detailed description of a particular normalization select its name in the list on the Normalizations tab and click View Text Description A dialog appears with a description of the selected normalization You can copy the text in this dialog to the key board by clicking Copy to Clipboard You can then paste the text into a text editor The Colorbar Tab Use this tab to edit the default range of expression in your experiment s coloration scheme You can also specify whether or not to show trust on the colorbar by clicking the radio button next to your preferred choice Click OK to save your changes or Cancel to exit without saving Use this tab to view the default range of expression in your experiment s coloration scheme You can also see whether or not trust is shown on the colorbar by clicking the radio button next to your preferred choice For more information about the range of expression and how it affects the coloration of your experiment see Changing the Colorbar Range on page 4 32 The Associated Files Tab This screen lists any files that may be associated with the experiment including data files array images sample images etc From this screen you can do the following Add File To
394. ples Condition Tree A dendrogram used to show the relationships between the expression levels of conditions Control An experiment data set that provides a comparison or contrast to experimental results Glossary Control Strength The quantity the raw value is divided by to get the normalized value see also expression strength Data Objects Any downloadable or uploadable items in GeneSpring such as genomes gene lists classifications etc Dendrogram A diagram showing hierarchical relationships based on similarity between elements for example similarity of gene expression levels Experiment A group of conditions associated together under one name This generally means they were all performed using a particular set of parameters Experimental Parameter A variable used to describe the condition or conditions during an experiment A set of parameter values defines a single experimental parameter When the word parameter is used alone it usually refers to an experimental parameter Experiment Interpretation Tells GeneSpring how to treat and display your experiment parameters and how normalized values should be treated Experiment Specification Area The area under the genome browser that indicates which if any sub experiment is being displayed e g a particular time point in a time series experiment Expression Production of mRNA through transcription of a DNA gene sequence Expression Level The amount of
395. port an organism specific pathway the GENE file is parsed and for each KEGG accession number the gene names are stored including the accession number itself if it is not fully numeric All EC numbers are also stored GeneSpring first tries to match the first gene name with the genes in the current genome It is compared against systematic and common names If no match is found GeneSpring attempts to match the second name and so on As soon as matching genes are found often Analyzing Data 6 16 Working with Gene Lists it is one gene but sometimes there are more GeneSpring stops any further processing in order to decrease the chance of false matches If no matches are found GeneSpring tries to match EC numbers Because several genes may correspond to the same EC number in this case GeneSpring searches for matches to all EC numbers in the GENE file This reduces the possibility of a false match Importing a Pathway Before you can import a pathway you must download it from the above FTP site To import a pathway 1 Select File gt New Pathway gt Import KEGG Pathway The Import KEGG Pathways window appears 2 Navigate to the directory of pathways you downloaded from KEGG 3 If desired check the box to group pathways alphabetically into subfolders of nine path ways each This is useful if you want to split a window by a group of pathways 4 If desired check the box to create gene lists from the imported pathways If
396. portant dialects of SQL are ANSI ISO SQL IBM DB2 SQL Server Oracle Ingres and ODBC SQL uses statements to get work done Examples of some of these statements are A 2 Installing from a Database Custom Databases and GeneSpring SELECT INSERT DELETE UPDATE DECLARE OPEN CLOSE CREATE PREPARE DESCRIBE SQL Call Level Interfaces When a Call Level Interfaces CLI is used a program requests database services by call ing special SQL interface routines rather than embedding SQL statements directly into the program There are two distinct types of CLIs First each DBMS vendor provides its own unique API for its database The vendor specific API is usually the most efficient way to access the database but each vendor s API is unique As a result if you decide to write programs that use a vendor API you lock yourself into using that vendor s DBMS How ever your programs are efficient as possible The second type of CLI is a standard or open API which is supported by more than one database vendor Several open database APIs are available one of which is ODBC ODBC is a standard CLI for accessing SOL databases from Windows The Genetic Analysis Technology Consortium The Genetic Analysis Technology Consortium GATC was formed in an attempt to stan dardize the rapidly growing field of array based genetic analysis The consortium was cre ated to provide a unified technology platform to design process
397. ppears On this screen specify whether you have a second table of genes This is generally used to add genetic elements to a Gen Bank or EMBL defined organism In this case the supplementary table of genes proba bly contains alleles centromeres or genes from strains differing slightly from the sequenced strain Ifyou do not have a separate table of genes leave No selected Ifyou have a separate table of genes select Yes You are prompted to enter a file name and select a file format Enter the complete filename and path or click Browse button to select a file Then select the appropriate format from the Select a file format menu For a description of the four format options see Data Format on page 2 10 When you are done click Next to proceed to the next screen The Links to Web Data Bases screen appears From this screen you can create a link to a web page or other online resource with relevance to your genome 9 Click the Links to Web Databases button to view a table of commonly used links You can copy these links and paste them into the table of links This list is also available from http www silicongenetics com cgi TNgen cgi GeneSpring GSnotes Notes have links If you do not want to include links to web databases click Next to proceed to the next screen If you want to include links to web databases select Yes ntheEnter number of links box type the number of web databases to link to When you enter a nu
398. ps based on this analysis The Build Simplified Ontology constructor builds over 300 biologically meaningful gene lists that can be compared merged or browsed These lists can be further used to annotate clusters and cross reference new gene lists Note You cannot rename these gene lists but you can update them Analyzing Data 6 31 Annotation Tools To Build a Simplified Gene Ontology list 1 Select Annotations gt Build Simplified Ontology 2 Enter a name for the new simplified ontology folder or leave the default name to over write the existing simplified ontology list 3 Click OK The new Simplified Ontology list appears in the Gene Lists folder To Make Gene Lists From Properties To create lists based on annotations see Making Lists from Properties on page 6 12 Building Homology Tables To build a homology table see The Homology Tool on page 6 24 Analyzing Data 6 32 Statistical Analysis ANOVA Statistical Analysis ANOVA Statistical Analysis ANOVA is a filter tool that statistically compares mean expression levels between two or more groups of samples The object is to find the set of genes for which the specified comparison shows statistically significant differences in the mean nor malized expression levels as interpreted according to your current interpretation mode logarithm ratio or fold change across all the groups This comparison is performed for each gene and the genes wit
399. ptions Principal Component 3 Principal Component 4 Change Colors Principal Component 5 iy Save Scores To save expression profiles whose values are the component scores for Save Scores each condition click on Save Scores To save gene lists whose associated values comprise the profiles ofthe principal components click Save Profiles Save Profiles Save Profiles Close Help Figure 7 14 The Principal Components Analysis Results window In the PCA results window double click a condition to view the Gene Inspector window which shows the eigenvalue and explained variability in the upper left panel This screen contains the following buttons Split Unsplit Window toggles between the default view and splitting the graph by component Show Bar Line Graph toggles between the bar and line graph views Clustering and Characterizing Data 7 19 Principal Components Analysis Change Colors allows you to change the colors used to display the components Save Scores save expression profiles showing the component scores Save Profiles save gene lists to profile the PCA components A second window appears that displays a condition scatter plot with the first three compo nents on the axes For information on using this view see Condition Scatter Plot on page 4 68 Interpreting your PCA Results The principal components of a data set are the eigenvectors obtained from an eigenvector eige
400. r Displays the following In Tree view the name of the selected gene tree and condition tree n Array Layout view the name of the array layout n Pathway view the name of the pathway Experiment s Plotted on Axis Displays the following The name of the experiment and interpretation being displayed on the Y Axis This is available in the following views Graph Graph by Genes Bar Graph Scatter Plot and 3D Scatter Plot The name of the experiment and interpretation or gene list and type of associated val ues being displayed on the X Axis This is available in the Scatter Plot and 3D Scatter Plot views The name of the experiment and interpretation or gene list and the type of associated values being displayed on the Z Axis This is available in the Scatter Plot and 3D Scat ter Plot views Split Window Information Displays the name of the Classification or Gene List folder used to split the window Coloring Information Displays different information depending on the coloring scheme selected Color by Expression The name of the experiment interpretation and condition used for coloring For experiments with continuous numeric parameters the condition may actually be an interpolation between two measured conditions In Scatter Plot and 3D Scatter Plot views the parameter value is also displayed since it affects where the genes are graphed Color by Significance The name of the experiment interpr
401. r and click Choose Gene List 2 Use the double ended slider or enter minimum and maximum restriction values in the fields provided If a gene list has no associated numbers you cannot select it for filtering in this view For example this filter cannot be applied to the all genes or all genomic elements lists because there are no associated values References Benjamini Y and Hochberg Y 1995 Controlling the False Discovery Rate a Practical and Powerful Approach to Multiple Testing Journal of the Royal Statistical Society B 57 289 300 Dudoit S Yang Y H Callow M J and Speed T P 2000 Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments Department of Statistics Technical Report 578 University of California Berkeley http stat ftp berkeley edu tech reports index html Holm S 1979 A Simple Sequentially Rejective Bonferroni Test Procedure Scandina vian Journal of Statistics 6 65 70 Miller R G 1981 Simultaneous Statistical Inference Second Edition New York Springer Verlag Westfall P H and Young S S 1993 Resampling Based Multiple Testing Examples and Methods for p Value Adjustments New York John Wiley amp Sons Inc Analyzing Data 6 65 Advanced Filtering Advanced Filtering From the Advanced Filtering window you can combine basic filters and analysis filters for more complex filtering operations All
402. r data are reliable Whether or not you have flags depends on your instrumentation and what you have entered into your master gene table See Measurement Flags on page 5 19 Viewing Data 4 11 Inspectors The T test P value In cases where there is replicate data a one sample Student s t test is calculated to test whether the mean normalized expression level for the gene is statistically different from 1 0 The t statistic is calculated as X 1 S ln f n where X gt X is the sample average of the n normalized expression levels X Xp i 1 2 1 EA dS X and S I gt X i l is the sample standard deviation of the replicates The value of f is compared with a table of the distribution of Student s t distribution with n 1 degrees of freedom to yield the significance level or p value for a two sided test that the mean gene intensity differs sig nificantly from 1 0 The Browser Display Inspecting a gene shows you the gene s expression over the experimental parameter time minutes The browser image reflects the experiment interpretation in the main browser window The only view option available in the Gene Inspector window is the Graph view Right click on the browser to use error bars in the browser display or create a resizable picture of the browser Right click and select Options to change the vertical axis range show or hide many of the browser elements and switch your view from norm
403. r each gene repeated in a data file before any other normaliza tions are applied to the raw values Frequently samples are repeated with exactly the same parameters but are reported in dif ferent data files If this is the case the fact the samples are repeats is represented via parameter The same normalization is employed when dealing with an experimental parameter considered to be a repeat but in that case the averaging takes place after the raw data for each gene has been normalized See Experiment Parameters on page 3 29 for more information about repeats reported in separate data files 5 18 Normalizing Data Normalization Strategies for Specific Technologies Mathematical Illustration of the Dealing with Repeated Measurements in a Single Data File Method Given this raw data with four repeats of YMRI99W marked with the arrows gt Y MR199W YMR200W YMR201C YMR202W YMR203W YMR204C YMR206W YMRI99W YMR207C YMR208W YMR209C YMR210W YMR211W YMR199W YMR212C YMR213W YMR214W gt YXMR199W gt YXMR199W GeneSpring averages all of the measurements of YMR199W to get a signal strength of 1286 GeneSpring notices the maximum control strength for YMR199W in this sample is 1496 and the minimum is 1117 These values are the end points of YMR199W s error bar which GeneSpring plots when you choose to display error bars in either the graph or the scatter plot displays Measurement Flags Measurement flags are ma
404. r experiment Experiment parameters These are variables that can incorporate many sample values Generally speaking when the term parameter is used it means an experimental parameter As an example experi ment parameters could be Drug Concentration Strain of Yeast nfection Replicate Number Parameter Values The values of the parameters in the previous list could be Drug Concentration in ppm 0 10 20 30 40 Strain of Yeast A or B nfection Healthy or Infected e Replicate Number 1 or 2 Parameters Displayed in the Navigator EH Experiments Experiment HS Default Interpretation Interpretation OF stage Embryonic Be ian P a L amp stage Adult PS y e several replicates EG All Samples Koi stage Embryonic day 11 Oni stage Embryonic day 13 Koz stage Embryonic day 15 t stage Embryonic day 18 ies stage Embryonic day 21 t stage Postnatal day d Sample es stage Postnatal day 7 see SMS ede ee Me ede ede alae Ao Figure 3 19 Data objects in the navigator Sample tThe data generated from a biological object placed onto an array or set of arrays Sample data is visible in the GeneSpring navigator under the All Samples icon Working With Experiments 3 29 Experiment Parameters Condition A unique combination of parameters as applied to your sample Each con dition may be a single sample or a group of replicate samples combined based upon the p
405. r instead by assuming that the amount of vari ability is a function of the control strength within all the measurements for a single exper imental condition The advantage of making this assumption is that the number of measurements used to estimate the global error is equal to the total number of genes on any given chip In addition measurement precision information supplied by the scanner software or inde pendently by the user can be loaded into GeneSpring via the Signal Precision column type in the column editor The value given in this column is interpreted as the standard deviation of the raw measured value The sample to sample variability includes the effect of both types of variation and the sta tistical separation of these effects is called variance components analysis The GeneSpring Cross gene Error Model performs this variance components analysis and uses the esti mates of these two components of variation to accurately estimate standard errors and compare mean expression levels between experimental conditions Separate estimates of two different kinds of random variation are used to estimate the vari ability in gene expression measurements Measurement variation This comprises the lowest level of variation corresponding to the variation of the measurement of a gene on a single chip around the true value that would be achieved by a perfect measurement of the expression level of the gene for that sample e Sample to
406. ram when it was created Making Lists from Classifications You can generate gene lists from any classification For example if you have a 5 cluster k means classification you can view which genes are in each cluster by making a gene list from the k means classification To make a gene list from a classification 1 Right click a classification in the Classifications folder in the navigator 2 Select Make Gene Lists GeneSpring creates a gene list folder for the classifica tion containing one list for each cluster This folder appears in the Gene Lists folder in the navigator Making Lists from Selected Genes This command allows you to make lists from genes you select graphically Analyzing Data 6 14 Working with Gene Lists There are two ways to select a set of genes If genes are grouped together in the browser you can select a set the same way you select an area to enlarge 1 Hold down the shift key click on a region and drag a rectangle across the desired area to select 2 Release the mouse button before releasing the shift key Selected genes appear in white Alternately you can select multiple genes by clicking over their representative lines or rectangles while holding down the shift key 3 Once you have selected all the desired genes right click in the genome browser and selectMake List from Selected Genes from the pop up menu A New Gene List window appears 4 Name your list and click Save For mor
407. rams 8 37 External Programs amp Choose Type of Output from External Pro BAL Choose Output Type cf C Gene List With Numbers Gene Name Experiment Data Experiment Data with Confidence C Classification C Tree Genome C Temporary Genome Label for Gene List With Numbers Cancel Show Example Figure 8 5 The Choose Type of Output window 2 Specify the type of output GeneSpring will receive from the external program Avail able options are Gene List A tab delimited list of systematic names one per line Gene List With Numbers A tab delimited list of systematic names and associ ated numbers one pair per line Gene Name A single systematic name Experiment Data Normalized experiment data one line per gene one column per experiment with header lines for the experiment name and each parameter Experiment Data with Confidence Normalized experiment data one line per gene two lines per experiment one for normalized data and one for control values with header lines for the experiment name and each parameter Classification A tab delimited list of systematic names and the name of the asso ciated classification group one pair per line Tree A hierarchical tree in XML format Genome An XML representation of the genome which will be automatically saved to disk during Import to GeneSpring Temporary Genome Identical to the Genome format except that it is not saved to disk Click
408. rd Correlation Pearson Correlation The Pearson correlation is very similar to the Standard correlation except that it measures the angle of expression vectors for genes A and B around the mean of the expression vec tors for example the mean of the expression values constituting the profiles for Gene A and Gene B Generally the mean of the expression vectors is positive since expression values are based on concentrations of mRNA Using the Pearson correlation you get more negative correlations than from the Standard correlation for example you find more genes that behave opposite to each other because of where you put the baseline at zero almost all gene values are above it at 1 there are a fair amount that read below the base line For data normalized to an overall level of 1 as with all normalizations that GeneSpring performs the Pearson correlation gives you almost the same correlations as the Standard correlation when they are both performed on the logarithms of the genes expression val ues To compute a Pearson Correlation 1 Calculate the mean of all elements in vector a 2 Subtract that value from each element in a Equations for Correlations and other Similarity Measures B 3 Common Correlations 3 Do the same for b Pearson Correlation A B A B Or in summation notation Y 4j 4 Bi B i n n y 4j A y Gj B i 1 i l Figure 1 2 Summation Notation for the Pearson Correlation Spearman
409. re W BMS MS the test statistic The approximate p value is calculated by looking up W in the upper tail probability of an F distribution with d and d degrees of freedom Note that d will not in general be an integer Nonparametric Analysis For the nonparametric analysis Analyzing Data 6 37 Statistical Analysis ANOVA Replace each X by Rx their rank out of all of the X for the gene Perform the same analysis as for parametric test with variances equal P values are approximate but asymp totically accurate Multiple Testing Corrections If you rely on the nominal p value when testing the statistical significance of group com parisons for many genes a significant number of genes pass the filter by chance alone For example if you test 10 000 genes for reliable changes between groups at significance level 0 05 assuming the tests are independent you would expect to misidentify about 500 genes as significant even when there is no real difference in gene expression Even if you identify 1 000 genes showing significant behavior by this approach half of the genes on the list appear by chance which lessens the value of the list Multiple testing corrections adjust the individual p value to account for this effect Suppose the p value cutoff is a and the number of genes being tested is N The first three procedures Bonferroni Holm and Westfall and Young control the family wise error rate FWER which is the overal
410. re examined and the new sample is assigned to the class showing the largest relative proportion among the neighbors after adjusting for the proportion of each class in the training set Decision Threshold P values are computed for testing the likelihood of seeing at least the observed number of neighborhood members from each class based on the proportion in the whole training set The class with the smallest p value is given as the predicted class The column labeled P value ratio is the ratio of the p value for the best class to that of the second best class The predictor will make a prediction if this ratio is less than the P value Cutoff specified on the initial panel and will not make a prediction if the ratio is above this cutoff Setting the p value cutoff to 1 will force the algorithm to always make a prediction but may result in more actual prediction errors C 1 Technical Details for the Predictor References for the Predictor Cover T M and Hart P E 1967 Nearest Neighbor Pattern Classification IEEE Trans actions on Information Theory IT 13 21 27 Duda R O and Hart P E 1973 Pattern Classification and Scene Analysis Wiley New York Golub T R et al Molecular Classification of Cancer Class Discovery and Class Predic tion by Gene Expression Monitoring Science v286 pp 531 537 1999 C 2 Technical Details for the Predictor Array A set of spots on a chip typically expressed as a s
411. read and analyze DNA arrays The goal of the GATC is to make micro arrays broadly available and provide a technology platform that allows investigators to use components from multiple vendors Databases and GeneSpring Experimental data are not always stored on the researcher s desktop in simple text files Sometimes the data are stored on a relational database GeneSpring can save and load all types of data to an SOL database through ODBC Experimental data can be loaded from a database simply by telling GeneSpring which table s contain the data and which columns contain the experimental index You load in the data using the Experiment Wizard almost exactly as you would if they were text files see Entering your Database into GeneSpring on page A 23 The only difference is you enter experiment identifiers instead of file names and SQL table columns instead of tab delineated column headers Parameters describe what the database knows about each sample Different databases have different ways of storing parameters so they must be retrieved by explicit SOL state Installing from a Database A 3 Custom Databases and GeneSpring ments Silicon Genetics can provide these for GATC and help write these for individual databases This needs to be done only once Afterwards the customer simply chooses the database and GeneSpring retrieves data from it Normalization and other options can also be set for a database A 4 Installing from a D
412. reen The value created between two values a and a is max atan a a 1 4 0 3 Do the same to make a vector B from b Upregulated correlation A B A B Equations for Correlations and other Similarity Measures B 7 Number of Samples Required to do Analyses Number of Samples Required to do Analyses For a gene to be included in an analysis listed below it must have been assigned an expression value in at least half of the total number of conditions in the experiment Each gene must also have the minimum number of measurements listed in the chart below k means SOM Trees Find Similar Standard Correlation 2 N A 2 2 Distance 1 2 1 1 Smooth Correlation 2 N A 2 2 Change Correlation 3 N A 3 3 Unregulated Correlation 3 N A 3 3 Pearson Correlation 3 N A 3 3 Spearman Correlation 3 N A 3 3 Spearman Confidence 5 N A 5 5 Two Sided Spearman Confidence 5 N A 5 5 For details on each of these clustering techniques see Clustering and Characterizing Data For Find Similar see The Find Similar Command on page 6 B 8 Equations for Correlations and other Similarity Measures Appendix C Technical Details for the Predictor Gene Selection In order to select genes for use in the predictor all genes are examined individually and ranked on their power to discriminate each class from all others using the information on that gene alone For each gene and each class all
413. rences for Hierarchical Clustering Everitt Brian S Cluster Analysis 3rd Ed Arnold London 1993 pp 62 65 Eisen Michael B et al Cluster analysis and display of genome wide expression pat terns Proc Natl Acad Sci USA V95 pp 14863 14868 December 1998 k Means Clustering K means clustering divides genes into groups based on their expression patterns The goal is to produce groups of genes with a high degree of similarity within each group and a low degree of similarity between groups Unlike self organizing maps k means clustering is not designed to show the relationship between clusters Instead k means clusters are con structed so that the average behavior in each group is distinct from any of the other groups For example in a time series experiment you could use k means clustering to identify 7 8 Clustering and Characterizing Data Clustering Methods unique classes of genes that are upregulated or downregulated in a time dependent man ner GeneSpring s k means clustering algorithm divides genes into a user defined number k of equal sized groups based on the order in the selected gene list It then creates centroids in expression space at the average location of each group of genes With each iteration genes are reassigned to the group with the closest centroid After all of the genes have been reassigned the location of the centroids is recalculated and the process is repeated until the maximum number of i
414. res Genes gene list one or sure Minimum sion profiles are correlated to a speci more experiment correlation fied gene over the conditions of an interpretations Maximum cor interpretation one numeric relation value Find Similar One gene one Similarity mea Outputs a list of genes whose expres Genes with gene list one or sure Minimum sion profiles are correlated to a speci fied gene over the conditions of an interpretation pool ues one or more tions usage conditions Find Similar One gene list Apply weights Outputs a list of samples correlated to Samples one target sam Weighting a specified sample ple one sample coefficient 8 28 Scripts and External Programs Script Building Blocks Name Input Knobs Description Gene Correla Two genes one Correlation Compare the two genes determine tion experiment their similarity with respect to the selected experiment Output a p value Gene List Simi 2 gene lists None Calculate the similarity between two larity p Value gene lists using the All Genes list as the Universe Output a number repre senting the probability that the inter section between the two lists could be due to chance Note that there is no multiple testing correction applied by this script Gene List Simi 3 gene lists None Calculate the similarity between two larity p Value gene lists using the specified gene Specified Un
415. retations Check this to use the Cross Gene Error Model by default in experiment interpretations Font Name and Font Size Specify the style and point size of the default display font Reset Font Click this button to reset the display font to its default value Language Allow you to select from the available language choices for the Gene Spring interface If your computer is set for a specific language use the same setting here Your Name Your Group Name Your Email Specify the name group name and email address values contained in the HTML files that go into your data directories Entrez mirror Enter a web address Welcome to GeneSpring 1 24 2 Creating Genomes In the context of GeneSpring a genome contains information about all the genes in your chip or microarray setup Note that a GeneSpring genome does not correspond exactly to the biological definition of a genome A genome in GeneSpring is composed of discrete genes as opposed to the full nucleotide sequence This means that a GeneSpring genome can contain two genes representing alternately spliced variants of a single gene whereas a true genome would include the DNA sequences for only one Setting up a genome is usually the first step in the analysis workflow There are two ways to set up a genome Request one from Silicon Genetics technical support e Use the Genome Installation Wizard described in The New Genome Installation Wiz ard on page 2 2
416. rganism you are installing in place of Ecoli This URL may contain many file for mats Make certain to download the file with the suffix gbk An EMBL file may be used in place of a GenBank file Adding Extra Genes to a Genome Defined by a GenBank or EMBL file You can use a GenBank or EMBL file to describe a genome and add extra genes This is typically done to represent a strain slightly different from the sequenced strain To do this you must create a separate master gene table containing all of the extra genes to add For mat these tables using one of the four table of genes formats discussed in Data Format on page 2 10 This file is parsed during genome creation but is not used again afterward Contact Silicon Genetics technical support at 1 866 SIG SOFT for help with this process If you are using an original gbk file you can simply go to their web site and update the entire file Make sure you save it with the same name and to the same place as your current gbk file Updating GenBank Information After loading your data into GeneSpring you may want to update your annotations For information on this procedure see Updating Annotations with GeneSpider on page 6 27 Sequence Data GeneSpring loads in sequence data from a GenBank or EMBL file automatically If you have sequence data that is not ina GenBank EMBL file place it in a separate file using the seq format The Silicon Genetics seq format is similar
417. riteria Regulatory Sequences Open Scripts gt Basic Scripts gt QC amp Analysis gt Regulatory Sequence Search Name Input Knobs Description Find Genes with Specific Regula tory Sequence One sequence From Base To Base Maximum errors Output a list of the genes that contain the input regulatory sequence 8 30 Scripts and External Programs Script Building Blocks Name Input Knobs Description Find Regulatory Sequences One gene list From Base To Base Minimum Length Maxi mum Length Minimum Errors Maximum Errors Minimum Inte rior N s Relative Genomic P value Cut off Outputs a list of regulatory sequences upstream of the genes in the input list Statistical Analysis ANOVA Open Scripts gt Basic Scripts gt QC amp Analysis gt Statistical Analysis Name Input Knobs Description 1 way ANOVA One gene list Groups specification See 1 Way ANOVA on one experi Test type P value cut page 6 34 for details ment interpreta off Multiple testing cor tion rection 1 way ANOVA One gene list Groups specification See 1 Way ANOVA on with Post Hoc one experi Test type P value cut page 6 34 for details Tests ment interpreta tion off Multiple testing cor rection Post hoc tests ment interpreta tion specification 2nd parameter Test type P value cu
418. rkers in your data set and data can be assigned as one of four flags Passed or OK Marginal Absent Failed Unknown Flags assigned by you when the experiment in entered into GeneSpring Good Data data are present and reliable Marked with a P for passed or O for ok Marginal Data data are present but of unknown or dubious quality Marked with an M for marginal Absent Data There is no data available and there should have been Marked with an A for absent or F for failed Flags assigned by GeneSpring Unavailable Data If there is no flag in the column GeneSpring assigns that mea surement a U Normalizing Data 5 19 Normalization Strategies for Specific Technologies Only measurements at the highest available flag level are combined and treated as repli cates The order of flag precedence is P M U A If one or more Ps are present only Ps are used If no Ps are present and one or more Ms are present only Ms are used etc Summary statistics are collected over these cases and stored with the corresponding flag All other flag data are discarded for the gene This is done when the experiment is loaded into Gene Spring and is not affected in any way by later choices about which codes are to be used or displayed The only way to avoid this is to not declare a flag column during data load in which case the flags are not available for other uses For information about measurement fla
419. rmation on this particular feature see Regulatory Sequences on page 6 18 Physical Position Display Options In the Physical Position view the following display options are available e Features The available options for this view are listed below e Coloring See Color on page 4 30 Legend See Legend on page 4 28 The Features panel of the display options window contains a column of check boxes that allow you toggle on or off certain items in the genome browser Viewing Data 4 43 Physical Position View Show Chromosome Label Displays the word Chromosome next to the chromo some names or numbers Show Chromosome Label on Side Displays the word chromosome vertically beside the chromosome names or numbers Show Base Pair Label Displays the words Base Pair next to the axis representing the sequence location Show ORF direction Places genes above or below the chromosomes depending on the direction they are transcribed Genes on the top of the line are transcribed from left to right Leaving this option unchecked places all of the genes on top of the chromo some lines Show Just One Strand of Bases Displays only the bases on the Watson strand when the genome browser is zoomed in enough to display them Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own s
420. s Memory Use for Experiment Loading In GeneSpring 6 0 experiments are loaded into the disk cache as well as into system mem ory RAM This requires some additional time when an experiment is first created Once the experiment is loaded it can then be reloaded in a fraction of the time This change was made to accommodate the loading and creation of very large experiments especially for systems with limited memory If you want to free some hard disk space or think that your cached data folder may be cor rupted delete the cache folder GeneSpring data cache where GeneSpring is the GeneSpring home directory on your machine This forces GeneSpring to recreate the experimental data which may solve the problem 3 2 Working With Experiments Importing Experiment Data Loading an Experiment 1 Select File gt Import Data or type Ctr1 0 On Windows and Unix sys tems you can also drag and drop files into the main GeneSpring window directly from your desktop This method is not supported for Macintosh 2 Choose the data file or folder to load All files in a folder must have exactly the same format The Define File Format window appears Import Data Define File Format and Genome DER File Format Choose File Format M Genome Select the genome set of genes on the array for this data If your genome does not appear on the list you can create a new one by selecting Create a New Genome Select Genome
421. s played directory On MacOsX this menu is not displayed To select a genome click the browse button Database Use the pull down menu to specify how GeneSpring assigns parameters for a series of numeric values in your database You must also specify the fully qualified classname of the driver in the JDBC driver field Use the pull down menu to specify how GeNetViewer assigns parameters for a series of numeric values in your database You must also specify the fully qualified classname of the driver in the JDBC driver field Color From this tab you can change the colors GeneSpring uses to represent different types of data and other screen elements There are a variety of default color schemes available to choose from The brightness of a color depends on the trust associated with it For more information see Trust on page 4 30 Welcome to GeneSpring 1 18 Setting Preferences Over and under expression color refers to the coloring of genes as shown in the genome browser and color bar To change the definitions of overexpressed upregulated and underexpressed downregulated genes right click over the colorbar in the main genome browser See Changing the Colorbar Range on page 4 32 for more details on this topic The colors you choose are blended to create a continuous spectrum from High to Normal to Low expression values There are two sections on this tab Standard Colors and Group Colors Standard Colors These ar
422. s Select Genome s Create Homology Table Between J Genomes or Arrays Genome Column containing GenBank Accession No HO Academic Chips Yeast 1 C1 Commercial Chips and each ofthe following genomes H Extraterrestrial Yeast H Mutant Yeast S Rat of Unusual Size lt lt Remove Progress Start Cancel Help Figure 6 15 The Homology Tool 3 From the pull down menu in the Column containing GenBank Accession No column select the appropriate column for the selected genome 4 Select a second genome from the mini navigator on the left side of the screen and click Add The genome is added to the lower table on the right side of the screen 5 From the pull down menu next to the newly added genome select the name of the col umn containing that genome s GenBank Accession Number 6 If desired add additional genomes using the same procedure 7 Click Start This process takes about as long to run as the GeneSpider If the initially selected genome does not have GenBank Accession Numbers an error message appears If a selected genome is not on Homologene you will receive an error message after the Homology Tool has finished running 8 When prompted specify whether or not to save the UniGene Cluster IDs The resulting homology tables are saved in the originally selected genome s Data Homology Tables folder Analyzing Data 6 26 Annotation Tools Annotation Tools
423. s 3 The name of one L20294 gene Experiment Yes Yes 4 Experimental Tup1 deletion experiment Data data One line per time minutes 1 2 gene one column per experiment YPR1 0 88 1 09 with a header line YGR1 1 81 1 63 for each parame gg 0 52 1 18 ter Scripts and External Programs 8 43 External Programs Hyperlinks and sequence are optional Format Input Output Num Description Example Experiment Yes Yes 5 As above except Tup1 deletion experiment data with confi two columns per time minutes 1 control 2 control dence experiment The first column has YPR1 8 43 9 70 normalized data YGR1 1 73 3 7 the second has YNLA 5 49 8 49 confidence val ues Classification Yes Yes 6 One gene perline 146 set3 followed by tab 158 unclassified followed by name 159 set5 of classification 170 set3 171 set3 181 set1 Tree Yes Yes 7 Hierarchical lt TREE DISTANCE 0 6 TITLE a gt YMR199W YPL256C lt TREE DISTANCE 0 1 TITLE b gt YALOO1C YALOO2W lt TREE gt lt TREE DISTANCE 0 2 TITLE c gt YALO19W YALO17W lt TREE gt lt TREE gt Genome Yes Yes 8 XML description lt GENOME CIRCULAR false gt of genome lt NAME gt Rat lt NAME gt lt HYPERLINKS gt GenBank http Awww ncbi nim nih gov PubMed http Awww ncbi nim nih gov 80 lt HYPERLINKS gt lt MAPPED_FILE gt GAD65 GAD2 4 1 1 15 glutamic acid decar boxylase pre GAD67 GAD67 4 1 1 15 glut
424. s allowed Element Contents Attributes lt ExternalDatabaseConfiguration gt lt GeneralConfiguration gt n a lt Database gt lt GeneralConfiguration gt lt LoadClass gt n a lt ProcessedDataListFile gt lt Database gt lt PhysicalDatabase gt name icon lt TechnologyType gt lt Header gt lt GenomeNames gt lt GetSamplelDs gt lt GetSampleAttributes gt lt GetFile gt lt GetRawData gt lt LoadClass gt plain text n a lt ProcessedDataListFile gt plain text n a lt PhysicalDatabase gt lt UserName gt name lt Password gt lt URL gt lt Prefetch gt lt TechnologyType gt n a name lt Header gt lt Author gt n a lt Research_Group gt lt Organization gt lt GenomeNames gt lt GenomeMappingSpec gt n a lt UserName gt plain text n a Password plain text n a lt URL gt plain text n a Installing from a Database A 7 Connecting your Database to GeneSpring Element Contents Attributes lt Prefetch gt plain text n a lt Author gt plain text n a lt Research_Group gt plain text n a lt Organization gt plain text n a lt GetSamplelDs gt lt DatabaseQuery gt location lt DataDirectory gt lt FileNameMask gt lt IDFromFileName gt lt JavaQuery gt lt GetSampleAttributes gt lt DatabaseQuery gt cacheable lt JavaQuery gt numeric lt GetFile gt lt DatabaseQuery gt type lt JavaQuery gt loc
425. s in the Gene List list input list s Gene List Dif Two gene lists None Outputs a list of the genes that are in ference the first gene list but not the second Gene List Inter Two gene lists None Outputs a list of the genes that are in section both input lists Gene List Two gene lists None Outputs a list of the genes that are in Inversion either input list Gene List Two gene lists None Outputs a list of the genes that are in Union either input list In all Gene One gene list None Outputs a list of the genes in all of the Lists group input lists In at Least One One gene list None Outputs a list of the genes in at least group one of the input lists Merge Gene At least one gene Percentage Outputs a list of the genes in the spec List Group list group Comparison ified proportion of the input lists In a Number of Array of gene lists Number of Outputs a list of genes that appear in Gene Lists gene lists from at least or other comparison the the array a specified number of gene lists gene must be in Compari son gt lt lt gt In a Percent Array of gene lists Percentage Outputs a list of genes that appear in age of Gene Comparison a specified proportion of the gene Lists lists Sort Gene List One gene list none Outputs the gene list sorted in descending order based on its associ ated numbers GeNet Downloading Default Directory Note that you will need to login to GeNet before using a script t
426. s normalization steps which may or may not be equivalent to the raw measure ments GeneSpring does not allow you to perform this normalization and normalize to sample s as they address the same issue Median Polishing Median polishing means that each chip is normalized to its median and each gene is nor malized to its median These normalizations are repeated until the medians converge up to amaximum of five iterations This limit prevents endless looping if the normalization coefficients do not converge If measurements are limited by flag values the percentile is calculated using only the genes that pass the flag restriction To limit measurements by flag values check the Use only measurements flagged box and select the appropriate option from the pull down menu The available options are Present Only Present or Marginal Anything but Absent Normalizing Data 5 15 Normalization Types If measurements are limited by a cutoff the percentile is calculated from all measurements above the cutoff This cutoff can be in either raw or partially normalized units The Raw Signal option means that the cutoff is applied to the raw measurements in the original data file These measurements are back calculated based on the previous normal ization steps Rounding errors may be introduced in this process Partially normalized means that the cutoff is applied to the gene values resulting from the previous normalization steps which
427. s representing multiple condi tions so that all conditions in the selected interpretation can be viewed simultaneously Using this feature disables the condition slider at the bottom of the genome browser Show unclassified Group When Splitting the Window When the window is split this option displays the genes that were not put into any classification into their own section of the genome browser 4 36 Viewing Data Graph View Graph View The Graph view allows you to visualize one experiment or a set of experiments by plotting the relative expression of each gene against experimental parameters such as time or drug concentration Each gene is represented as a line To choose the Graph view select View gt Graph Note Genes with no data cannot be displayed in this view The Graph option consists of two views the continuous graph view and the histogram view which appears if the experiment being displayed contains any non continuous parameters eB Full GeneSpring Yeast Genes all genomic elements DER File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists Experiments HC Gene Trees HC Condition Trees C Classifications FC Pathways FC Array Layouts Expression Profiles External Programs HCJ Bookmarks Hy Scripts Expression Normalized Intensity log scale 0 10 20 30 40 50 60 70 80 100 1
428. s yeast ArrayLayouts Examples of ayout files for Arrays Here is an example for Pat Brown s yeast layout The following is from a file Pat lay out Name Icon XXX gif VerticalSubArrays 2 HorizontalSubArrays HorizontalPerSubArray Verti Verti cal cal PerSubArray Duplication Pat Brown s Yeast Layout 2 40 40 1 HorizontalDuplication 1 ri CommonArrayType DataFileName Q X Y PatLocationList txt Following are the first few lines of the file PatLocationList txt YHROO7C 1 13 1 YBR218C 2 13 1 YALO51W 1 14 1 YALO53W 2 14 1 YALO54C 1 15 1 YALO55W 2 15 1 YALO56W 1 16 1 Here is an example for a CLONTECH Array from a file Clontech layout Name Clontech 588 Icon XXX gif VerticalSubArrays 2 HorizontalSubArrays HorizontalPerSubArray Verti Verti cal cal PerSubArray Duplication 3 14 14 1 HorizontalDuplication 2 i CommonArrayType Clontech Making an array can be a complicated process Contact Silicon Genetics Technical Sup port at 1 866 SIG SOFT or support silicongenetics com for more information on this topic Creating Genomes 2 13 Renaming and Deleting Genomes Renaming and Deleting Genomes GeneSpring saves information about each genome in the data subdirectory of the Gene Spring folder For example on a Windows system this might be C Program Fi
429. sample GeneSpring takes the average of the measurements In addition GeneSpring saves the minimum and maximum values and display them in the Gene Inspector See Dealing with Repeated Measurements on page 5 18 for a mathematical explanation of this process Remembered Formats While you cannot edit remembered formats you can share them If you must change a remembered format you must build a new one To share remembered format files use your favorite browser or file management program to copy the file from Working With Experiments 3 21 Default Normalizations YourLocalDrive Program Files SiliconGenetics GeneSpring data Experiment Formats name expformat The above path must be typed all on one line You can then paste the file into a shared drive Working With Experiments 3 22 The Sample Manager The Sample Manager The Sample Manager is an important part of the experiment creation process but it can also be used on its own From the main GeneSpring window select Experiments gt Sample Manager 5 Sample Manager Filter on Experiment Filter on Keyword Filter Results 16 Samples Filter on Attributes Filter on Pararneter Show All Creation Date Upload Date Research Group yeast timeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics a x yeasttimeseries t Type Your Name Jun 19 2001 10 2 Silicon Genetics Filtering Method yeast timeseries t Type Your Name Jun 18 2001 10 2 Silicon Genetics Tabs
430. se A 15 Connecting your Database to GeneSpring true the SQL query is passed the sourceName specified in the current lt GenomeMap pingSpec gt tag Use the db attribute to specify the database to query Accepted values for useGenomeName are true and false The data retrieved by this option varies depending on which tag contains it as follows e lt GetSampleIDs gt a list of sample identifiers e lt GetSampleAttributes gt three columns or a multiple of three columns in which case each set of three is considered independently These three columns are sample attribute value sample attribute name sample attribute units Each row represents one attribute If there is more than one set of three columns for each set of three each row represents an attribute e GetFile ifthe location attribute is set to database returns two or three columns the data the filename the mime type if present this overrides the mimeType specified in lt Get File Each row in the result represents a file to be loaded Contents SQL command Attributes useGenomeName db Usage lt DatabaseQuery useGenomeName true db dbname gt select ID from Exper iments where Experiments chipType DatabaseQuery Notes Required if the location attribute value in GetSampleIDs is database lt DataDirectory gt If samples are contained in flat files rather than a database
431. ser is the primitive Filter Genes with Associ ated Numbers 8 10 Scripts and External Programs Block Block Type Another Script r Knobs im Filter Cutoff Filter Cutoff Value Number knob Filter Cutoff Knob Notes The cutoff for the filtering Notes This script creates a genelist of genes that have a control strength equal to or greater than a user supplied cutoff in at least half of mui Name Fiter on Noise Using the ScriptEditor Figure 8 4 The Block section for building blocks The Icon Legend There are many icons in the ScriptEditor intended to help you keep track of the objects available To view information on these icons select Help gt Icon Legend You may want to leave the Icon Legend open and visible on your desktop until you are accustomed to working with the icons S ScriptEditor Icon BAER Building Blocks Script F Unfinished Script T Script Primitive Ej Exemal Program Primitive Knobs db Number IE Choice A Tet Inputs and Outputs Boolean Number Gene Drawn Gene Gene List Gene Tree Sample Experiment Condition end b n JB Condition Tree Experiment Interpretation Figure 8 5 The Icon Legend Scripts and External Programs 8 11 Using the ScriptEditor The Properties Panel The Properties panel allows you to view many aspects of a script To view script properti
432. sifications Pathways Array Layouts Expression Profiles HC External Programs Bookmarks Ho Scripts Expression Normalized Intensity log scale ime 1 0 10 20 30 40 50 60 70 80 100 120 140 160 Y axis Yeast cell cycle time series no 90 min Default Interpretation Colored by time 0 minutes Gene List all genomic elements 7216 ep ET 1 1 1 j 1 j 1 1 1 1 1 1 Show All Genes Zoom Ou Zoom F t Magnification 1 Figure 4 22 The Bar Graph view The figure above shows a Yeast cell cycle time series in Bar Graph view Bar Graph View Display Options In bar graph view the following display options are available Vertical Axis See The Vertical Axis on page 4 27 Features The available options for this view are listed below Lines to Graph The available options for this view are listed below e Coloring See Color on page 4 30 Legend See Legend on page 4 28 Lines to Graph GeneSpring provides the option to draw grid lines to help distinguish distinct groups of data points To modify the use of lines 1 Select View gt Display Options or right click anywhere in the genome browser and select Display Options Viewing Data 4 39 Bar Graph View Click the Lines to Graph tab To see a grid inside the plot area you can have lines drawn at the major and minor tick intervals of each axis Check any
433. simple search 4 4 simplified ontology 6 31 smooth correlation 6 10 B 6 SOM Euclidean distance 7 11 minimum number B 8 Spearman confidence 6 11 Spearman correlation 6 11 split windows 4 25 classification 4 35 spreadsheet view 4 67 SQL A 2 standard correlation 6 10 B 3 starting GeneSpring 1 3 system requirements 1 2 T text color 1 20 tick spacing 4 28 tree view 4 52 labels 4 55 magnifying 4 55 printing 4 55 viewing gene names in 4 56 Index 7 viewing nodes 4 55 W viewing parameters in 4 56 viewing subtrees 4 52 web databases 2 4 trees special character 2 4 comparing genes in nodes 4 52 web links 4 13 minimum distance 7 4 wizards minimum number B 8 new genome 2 2 R values 7 4 separation ratio 7 8 trust 4 30 t test 4 12 two color experiments 4 30 two sided Spearman confidence 6 11 U under expressed color changing 1 19 Update annotations 6 27 updating annotations 6 27 updating master gene table 6 27 upload to GeNet 9 12 uploading to GeNet 9 11 upregulated color changing 1 19 upregulated correlation 6 11 V version notes 1 4 vertical axis 4 27 view gene details 4 10 views 3D scatter plot 4 49 array layout 4 60 bar graph 4 39 blocks 4 36 compare genes to genes 4 64 graph 4 37 graph by genes 4 65 ordered list 4 58 pathway 4 62 physical position 4 41 scatter plot 4 45 spreadsheet 4 67 trees 4 52 Index 8
434. sion Change List and Filter LS Bestkemean on Noise scripts to produce a E 4 SR x g single gene list that passes both MB Clustering 2 Fold C s filters but does not have any RE Filter on Noise genes on the input gene list HB Find List of Similar LB Pairwise Comparis I amp Probe Entire Enterp Knobs dt HB Select kcmeans r cutoff 1 HB Send Clustering Re S Series of k means EC External Programs Difference B 2 fold Expression Chat LB Best means LS Clustering 2 fold Chan LB Filter on Noise LB Find Similar Genes LB Make Gene List from T Outputs gp eene List gt Figure 8 3 The main ScriptEditor window The ScriptEditor Block for Building Blocks The block section displays the following information on the currently selected item e Name The name of the selected building block Notes The function of the selected building block Basic This box can display a variety of information about the selected building block The example in Figure 8 4 shows information about the knobs associated with the selected building block These items are different for each building block You may see several types of input fields including pull down menus and text boxes You can get context sensitive help on any script building block by right clicking on it and selecting Help Double clicking a building block brings up an inspector window for that building block In Figure 8 4 the selected item in the brow
435. sion profile as an expression profile which you can use to make lists For information on making lists from expression profiles see Creating Expression Profiles on page 6 15 Lists Containing Your Gene In the bottom center of the Gene Inspector window is a navigator for the lists containing your gene Select a list to view the Inspect List window For information about this win dow see The Gene List Inspector on page 4 20 Searching Internet Databases In the Windows version of GeneSpring you can set up the Gene Inspector window to search public databases See The New Genome Installation Wizard on page 2 2 To configure a web browser with a Macintosh go to Edit Preferences Browser and enter the appropriate pathway Notes Section In the upper left corner of the Gene Inspector window under the Gene Identification Sec tion is an area where you can make notes To save these notes click Save Notes The Sample Inspector The Sample Inspector allows you to view and edit detailed data on a particular sample It can be accessed by clicking the Inspect button in the Sample Manager or by right clicking on a sample in the navigator and selecting Inspect Sample Inspector Sample Name Mergen Rat01_ORFs bt Author s Meredith Tanner Research Group self Organization marketing Identifier begaz 312 Created Sun Jun 22 17 45 28 PDT 2003 Application Full GeneSpring 6 Notes Figure 4 7 The upper half of
436. sitive n l Onl Last Basepair No 5 um ibus Sequence aire I Search Only Current Gene List Figure 4 2 The Advanced Search Window 2 Check or uncheck the fields to search For more information about the contents of these fields see Annotation Options on page 9 6 3 Enter a search term in the Search For field Enter multiple terms separated by the words and or or and not in this field AND match any gene containing both of these terms even if they do not appear in the same field 4 4 Viewing Data Finding and Selecting Genes OR match any gene containing either of these terms even if they do not appear in the same field AND NOT nmatch any gene containing the first term but not containing the second e Enter an asterisk in this field to match any gene with an entry in any of the fields you specified in step 2 Restrict your search to specific regions within the genome using the Map Location fields You can specify the chromosome number or search for a particular sequence for any organism that has mapping information Fororganisms that are not completely sequenced you can search for cytogenetic band markers For organisms that are completely sequenced you can restrict your search to regions between specified bases Only the fields appropriate for the given genome appear Any gene that falls even partially within the specified region is identified by the search You can r
437. specify the column to which the gene identifier should be matched Analyzing Data 6 63 Basic Filters 4 If the column header row chosen is incorrect use the First Line of Data field to adjust the number of rows If GeneSpring did not identify any column header row you must first check the Has Column Titles box 5 Select the column or columns in which to search by checking the Search box at the top of the desired column s 6 Restrict column values by choosing a value from the Column values must be pull down menu and inserting a restriction value in the field provided The available choices are Less than Greater than Equal to e Not equal to Contain For example if you load an Affymetrix file you can use the pull down menu to select the Abs call column and select for all entries equal to M This produces a list of just the marginal data 7 Specify the number of columns the desired value must appear in Select an option from the Value must appear in pull down menu and enter a value in the field pro vided The number to the right of this box indicates the number of columns that have been selected Filter on Gene List Numbers GeneSpring can filter genes according to the numbers associated with them in a gene list When you make a new list based on a filter or similarity metric the value used as a filter is associated with the genes on the new list Some examples of associated numbers are corre lation coeffic
438. st cell cycl Gene List all genomic elements 7216 Figure 7 8 A 3x2 SOM of the Yeast cell time series no 90 min experiment If you have selected many panels you may want to hide the horizontal and vertical labels for easier viewing Right click the genome browser and select an option from the Options submenu You can also increase your viewing space by selecting View Visible Hide All If you use a SOM to produce a classification you can get details about the classification from the Classification Inspector For information about the Classification Inspector see The Classification Inspector on page 4 22 To recreate your SOM graph click the SOM classification or folder of gene lists in the navigator and select Split Window Both SOM References Kohonen T 1990 The Self Organizing Map Proc IEEE 78 9 1464 1480 Kohonen T 2000 Self Organizing Maps Third Edition Springer Verlag Berlin Tamayo P Slonim D Mesirov J Zhu Q Kitareewan S Dmitrovsky E Lander E Golub T 1999 Interpreting patterns of gene expression with self organizing maps Clustering and Characterizing Data 7 13 Clustering Methods Methods and application to hematopoietic differentiation Proc Nat Acad Sci USA 96 2907 2912 QT Clustering QT clustering looks for clusters of genes such that each gene in the cluster is within a specified distance based on a user defined distance metric of every other gene in the
439. stering 7 8 L legend 4 28 license key obtaining 1 3 linked windows 4 25 list inspector 4 20 Lists from annotations 6 12 Regulatory Sequences 6 23 Venn Diagram 6 13 load sequence command 4 43 loading data 1 8 loading experiments 3 3 new experiment checklist 3 7 the Define File Format window 3 3 the Merge Files window 3 5 the Required Sample Attributes window 3 6 the Select Corresponding Files window 3 4 the Select Files window 3 3 macintosh tips 1 13 magnification 4 71 managing samples 3 23 master gene table 2 10 6 27 mathematical notation B 2 measurement flags 5 19 Abs Call 3 40 memory 1 3 merge files 3 5 minimum distance 7 4 missing expression values B 2 MySampleAttributes xml file 3 35 navigator 1 13 4 71 negative control strengths 5 20 new parameters 3 33 3 36 no color 4 34 nodes 7 11 non continuous elements 3 31 normalization types 5 6 normalizations 5 2 add step 5 3 affine background correction 5 13 Affymetrix data 5 17 all samples to specific samples 5 13 applying default 5 4 data transformation 5 6 divide by specific samples 5 13 divide signal by control channel 5 7 dye incorporation 5 8 dye swap 5 7 edit step 5 3 gene to itself 5 15 intensity dependent 5 8 median polishing 5 15 negative control 5 7 negative control strengths 5 20 per chip 5 10 per gene 5 13 per spot 5 10 per chip 5 10 per spot 5 7 pre normalized data 5 12 real time PCR transform 5 6 region 5 12 5 17 remove step
440. t NCBI as summarized in the following table Annotation LocusLink Label Common Official Gene Symbol or Interim Gene Symbol plus any Alternate Symbols Map Position Cytogenetic or Chromosome EC number EC number Product Product Phenotype Phenotype UniGene Update Annotations from UniGene reads the html source from queries to the UniGene database at NCBI The common name and description are from the line immediately following the UniGene cluster ID and the species name If only one item is shown it is returned as the description The map annotation is taken from the Cytogenetic Position under the MAPPING INFOR MATION heading if Cytogenetic Position is not given the Chromosome number is used NCBI has the following requirements for automated access Runretrieval scripts on weekends or between 9 PM and 5 AM ET weekdays for any series of more than 100 requests Referto the NCBI disclaimer and copyright notice at http www ncbi nlm nih gov About disclaimer html See http www ncbi nlm nih gov entrez query static eutils help html for more information Building a Simplified Ontology The Build Simplified Ontology tool hierarchically groups genes into meaningful biologi cal categories gene lists based on the Gene Ontology Consortium Classifications GO To form these groups GeneSpring s ontology tool parses all of the annotations in the genome It then assigns each gene to one or more ontology grou
441. t socket input socket E Knobs A L Building Block label Classification Output socket Outputs Figure 8 2 The Browser To view an existing script or any of the building blocks 1 Select an item in the navigator 2 Drag the item to the browser area 3 Click to view the selected item ScriptEditor Notes The This Script section displays the following information about the currently selected item Name The name of the selected script Label A label for the script distinct from the script name You can enter a brief description of the script to make it easier to identify The Label is displayed instead of the Script Name when the script is used as a block inside another script Notes You can alter these notes in any way you wish and add a nearly unlimited amount of text These notes appear in the script and in the Properties box when the script is visible in GeneSpring If no item is selected it displays empty boxes as it would for a new script See Figure 8 1 for an example Scripts and External Programs 8 9 Using the ScriptEditor amp ScriptEditor 2 Fold Expression Change AND Filter on Noise NOT Input DER File Edit Help Bg Scripts This Script EHO Basic Scripts i Experiment gjoene List Name 2 Fold Expression Change A EH Examples Label EP Example Scripts SAM Pi 3 Notes Ls i Mu B 2 Fold Expression 2 fold Change Filler on Noise This script combines the 2 fold 8 EXIT HEP Expres
442. tact Silicon Genetics technical support on the Web Silicon Genetics on the Web Select this option to browse the Silicon Genetics website This site contains a wealth of information including manuals and information on workshops designed to help you use GeneSpring more effectively 1 4 Welcome to GeneSpring Getting Started GeNet Database Select this option to browse the GeNet Web page From here you can download a demo version of GeNet and upload or download additional information See the GeNet User Manual for more information Support and Training Resources Select this option to view the Silicon Genetics training page Here you can take advantage of Silicon Genetic s many training options System Monitor Select this option to view the Java system monitor to track free memory and view the pro cesses running on your computer Test Database Connectivity If you are using a database with GeneSpring select this option to verify that GeneSpring and the database are communicating with each other Show Helpful Hints Select this option to display a new helpful hint each time you start GeneSpring About Select this option to view information about GeneSpring such as the version number and demo expiration date If you contact Technical Support they will ask you for the version of GeneSpring you are using Welcome to GeneSpring 1 5 GeneSpring Basics GeneSpring Basics GeneSpring is a powerful analysis tool Like a
443. taining a License Key If you have already installed a demo copy of GeneSpring your license key will expire within one month of the initial installation Once you have purchased a full GeneSpring license Silicon Genetics will send you a license key Save this license key file in the Sili con Genetics GeneSpring Data folder If you have kept the default settings of GeneSpring on a Windows machine look in C Program Files and on a Mac look in the Applications folder When the key is about to expire you will get a warning message 30 days in advance If your license has expired or is about to contact Silicon Genetics at 1 866 SIG SOFT 744 7638 Setting Memory Usage Options Once GeneSpring is installed make sure the default memory setting in GeneSpring prefer ences is half of your computer s available memory or more if you have a lot of RAM To do this select Edit gt Preferences choose System from the pull down menu and enter the amount of memory in the Desired Memory Use field Configuring Virtual Memory At least 150MB of virtual memory is required for optimal GeneSpring performance To ensure that large files are not interfering with software performance you may need to move some large files to a different hard disk If you continue to experience slow performance check memory usage by selecting Help System Monitor before invoking any functions Make a record of the Total Mem Welcome to GeneSpring 1 3 Getting Started
444. taining raw reference data This is typically present in two color experiments Contents plain text Attributes n a Usage ReferenceColumn 32 ReferenceColumn Notes Optional lt SignalBackgroundColumn gt Specifies the column containing the background signal to be subtracted from the main sig nal before further processing Installing from a Database A 19 Connecting your Database to GeneSpring Contents plain text Attributes n a Usage SignalBackgroundColumn l SignalBackgroundColumn Notes Optional rarely used lt ReferenceBackgroundColumn gt Specifies the column containing the background signal to be subtracted from the reference signal before further processing Contents plain text Attributes n a Usage ReferenceBackgroundColumn l ReferenceBackgroundColumn Notes Optional rarely used lt ExperimentWorkedColumn gt Specifies the column containing a flag or flags indicating success of the measurement Contents plain text Attributes n a Usage lt ExperimentWorkedColumn gt 33 lt ExperimentWorkedColumn gt Notes Optional see also ExperimentWorkedDesignation ExperimentAb sentDesignation gt lt ExperimentMarginalDesignation gt lt ExperimentWorkedDesignation gt Specifies the flag in the column specified by lt ExperimentWorkedColumn gt that indi cates that the measurement worked well Contents plain text Attributes n a Usa
445. terations has been reached 010 30 50 70 100 130 160 010 30 50 70 100 130 160 Set 1 1 640 genes 1 640 shown Set 2 1 769 genes 1 769 shown Set 4 1 167 genes 1 167 shown Normalized Intensity log scale time minutes a_i a AP SL a oe T 010 30 50 70 100 130 160 Set 5 340 genes 340 shown Unclassified 1 089 shown Graphed by Yeastcell cycle time series Colored by Calculated Split by 5 cluster K Means for Yeas Gene List all genomic elements 7216 Figure 7 5 A k means Cluster display in a Split Window K means clustering Options The following options are available Number of Clusters The number of clusters to make Number of Iterations The maximum number of times that each centroid is recalcu lated after genes are reassigned to groups with the most similar centroids Similarity Measure available options are Standard Correlation Smooth Correlation Change Correlation Upregulated Correlation Pearson Correlation Spearman Correlation Spearman Confidence Two sided Spearman Confidence Distance For more information on measures of similarity see Similarity Definitions on page 7 4 Clustering and Characterizing Data 7 9 Clustering Methods Start from Current Classification Group genes using the selected classification as a starting point Note that this option is available only if you have selected a classifica tion This option disables
446. ternal programs included with GeneSpring the button in the lower right portion of the screen is labeled Program Details Click this to view additional details about the program You cannot make any changes from the View Additional Program Details window If the program can be edited the button reads Edit Program Click Edit Program to view the Edit Program window This window is identical to the Create New External Program window See The New External Program Window on page 8 35 for more infor mation Data Formats for the External Program Interface You can choose what data and what format data is sent and received according to the table below All formats are optionally terminated in a special termination character ascii 255 The formats are generally text files You can also send or receive multiple data objects For example your program might want to receive both the currently selected gene list and the currently selected experiment or it might send a new genome to GeneSpring followed by an experiment for that genome Format Input Output Num Description Example No data Yes Yes 0 No data Typically used for one way communication Gene List Yes Yes 1 List of gene L20294 names one per M89777 line X95403 M63630 Gene List with Yes Yes 2 List of gene L20294 1 numbers names one per M89777 3 line with each X95403 3 566 gene followed by M63630 0 a tab and an asso ciated number Gene Name Yes Ye
447. th Gene Lists begins from first nucleotide Zoom in on the Ordered list view or open the Gene List Inspector to view these numbers Extend Promoter Adds a new longer and hopefully better promoter in the Find Potential Regulatory Sequences window The Conjectured Regulatory Sequence window contains the following sections Details Provides a general description of the common sequence motif being inspected The details found in this box are the same numbers listed in the right hand columns of the Results box in the Find Potential Regulatory Sequences window Offset Bases The middle third of the Conjectured Regulatory Sequence window con tains statistics on the bases to either side of the motif The first column contains the off set from the observed sequence The next four columns contain the percentage of genes with that base in that position The last column contains a suggested extension to the motif ORF The bottom third of the Conjectured Regulatory Sequence window contains the sequence information for the motif being inspected as it occurs in the nucleotide sequence in the area near or in each gene where it is found There are three columns of data ORF Indicates the gene that the common sequence motif given in bold centered in the column is upstream of Distance Displays the number of bases upstream the oligomer is from the ORF associated with it in the first column This number is the difference between the b
448. th no data in half the starting conditions Discard any genes with no data in at least half the conditions in the selected experiment Note A good way to estimate the optimum number of rows and columns is to try to pre dict how many distinct classes of genes are affected by the conditions in your exper iment With small data sets the algorithm may generate a number of empty nodes To avoid this you might try using a smaller grid SOM Results When the SOM operation is complete the Choose Classification Name window appears Choose Classification Name DER 6x5 SOM for Yeast cell cycle time series no 90 min Default Interpretation Self Organizing Map clustering of like YMR199W CLN1 0 95 based on interpretation s interpretation Yeast cell cycle time series no 90 min Default Interpretation mode Log of ratio weight 1 0 The parameters used Rows 6 Columns 5 Iterations 60000 Radius 6 0 Save Classification As Gene Classification YALO34W A 4 4 YALO35C A 4 6 Gene Lists YBROTIW 4 1 YBR073W 1 1 YBRO88C 5 2 YBROSSW 5 2 YBR161W 4 8 YBR275C 1 2 YCLO61C 2 1 YCROBSW 4 4 YDLOO3W 5 2 YDLOOSC 2 5 YDLO1OW 1 1 YDLD11C 1 1 YDLD18C 4 1 Classification CX Classifications Figure 7 7 The Choose Classification Name window To save your results l Enter a name in the Name field at the top of the screen Names may not exceed 80 characters To save the results
449. than one sample The Gene Trees Folder Any gene trees created in GeneSpring are kept in the Gene Trees folder Gene trees are dendrograms used as a method of showing relationships between the expression levels of genes over a Series of conditions The Condition Trees Folder Condition trees are like gene trees except that instead of showing the relationships between genes they show the relationships between the expression levels of samples Condition trees are kept in the Condition Trees folder The Classifications Folder The Classifications folder contains genes that have been grouped or classified to divisions defined by k means or SOM clustering The Pathways Folder Pathways are images of regulatory or metabolic pathways that can be imported into Gene Spring Genes are overlaid on these images allowing you to observe their changing expression levels across experimental conditions A feature called Find Genes Which Could Fit Here can be used as a tool to predict new pathway elements The Array Layouts Folder The Array Layouts folder contains information about the arrangement of the spots on your array These can be used to recreate an image of your arrays to check for regional abnor malities 1 14 Welcome to GeneSpring Basic Actions The Expression Profiles Folder Expression profiles are lines representing gene profiles that you draw in the genome browser You can then search for genes matching that profile Any express
450. the Classification view in which genes are displayed according to pre defined categories However you can also view displayed genes as a graph a scatter plot a bar graph an ordered list etc Note that some views such as Tree Pathway and Array Layout require some preparation such as creating a tree or adding a pathway or Array Layout image For details on views see Chapter 4 Viewing Data Zooming in Welcome to GeneSpring 1 11 Basic Actions To zoom in on a region or gene click on an area and drag your cursor diagonally An expanding rectangle appears Release the mouse and GeneSpring zooms in on the region enclosed by this rectangle Zooming out To zoom out click Zoom Out or right click Control click for Mac and choose Zoom Out to go back one level or Zoom Fully Out to zoom out as far as possible Moving around the screen You can move around a zoomed in screen by using Page Up Page Down and the arrows keys Selecting a gene Click once on a single gene to select it Selecting multiple genes Hold down the Shift key and drag to select multiple genes Or hold down the Shift key and click on individual genes to select them one by one Finding a specific gene Select Edit gt Find Gene or Ctrl F Enter the gene name or keyword and click OK GeneSpring selects and zooms in on the gene Inspecting genes You can view detailed information about a gene by double clicking on it to bring up the Gene Inspe
451. the Sample Inspector Window The Sample Inspector screen has two sections The upper section contains basic informa tion about a sample Weblinks buttons appear in the upper right portion of the window only if there are web links associated with the genome containing the sample These link to LIMS type data Viewing Data 4 13 Inspectors bases For more information on weblinks in genomes see page 4 in the Genome Installa tion Wizard section The lower section contains four tabbed menus from which you can view or edit a variety of information about the sample being inspected Click a tab to view the available options Attributes and Parameters Attributes and Parameters Similar Samples Associated Files Graph Sample Attributes Attribute Name Value Units Required New Attribute Diseased Normal diseased Recommended El Experimental Parameters Experiment Name Parameter Name vue uns Really Huge Rat File Name Mergen Rat01 ORFs bt Figure 4 8 The Attributes and Parameters tab There are two lists visible on this screen the Sample Attributes list and the Experimental Parameters list In the Sample Attributes list you can view the attributes of the sample being inspected You can also add remove or edit any of these attributes The Experimental Parameters list displays parameters assigned to this sample in any experiment Since a sample may be part of several experiments you cannot edit exper
452. the external program runs GeneSpring recognizes the standard output generated by the external pro gram and displays it in the genome browser To run an external program double click its name in the GeneSpring navigator or right click on it and select Run In earlier versions of GeneSpring it was necessary to manually create a programdef file for each external program In GeneSpring 5 1 this process is automated through the Gene Spring interface The New External Program Window New External Program H External Programs Icon Program Executable External Program Java Class C HTTP Command Line Browse Inputs outputs Delimiters Arguments Input from GeneSpring to External Program Add Input No Data Debug Input Figure 8 3 The New External Program window From this window you can specify the inputs outputs program type and other informa tion about an external program Any program capable of receiving standard input can be run directly from Genespring To install a new external program you must provide the following information Scripts and External Programs 8 35 External Programs e Name The name of the program as it will appear in the GeneSpring navigator Choose a descriptive name that will be easy to remember Folder The folder in which to save the external program definition To save the experiment in an existing folder navigate to that folder in the direct
453. the following codes just after the parentheses e S means the data is interpreted as a non continuous element also known as a dis crete element See Non Continuous Element on page 3 31 for details e C data is colored by the different parametric values assigned automatically by GeneSpring In Figure 3 12 each column would get a different color as time values 0 160 See Color Code on page 3 31 for details e R data is interpreted as a replicate not shown See Hidden Elements on page 3 31 for details You can enter all parameters with the default with no code after the parentheses and change the interpretation later from within GeneSpring See Experiment Interpreta tions on page 3 39 For example for the parameter tissue type a non continuous non numeric parameter the first column might look like this tissue type S If you have no parameters enter arbitrary but meaningful names so that you can dis tinguish each sample from those in other columns Data There can be only one gene per line The name ofthe gene must be in the first column The following columns are data points for each sample Working With Experiments 3 18 Copying and Pasting Experiments Experiment Parameter Values Name First Parameter Name with units Normalized Data r itiple DisgeSe Example gt gt gt gt gt gt gt gt gt Sick no gt y gt yP gt y gt vy
454. the gene in question in experiment 3 to the gene named in the title bar also from experiment 3 cisthe weight associated with experiment 3 and so on Experiments 1 2 3 etc represent all of the experiments selected in the white Correla tions box If X is between the minimum and maximum correlations specified in the Clustering window the gene in question passes the correlations Similarity Definitions Similarity definitions are used in several clustering types The equations used to determine the nine types of correlations are described in detail in Appendix B Equations for Corre lations and other Similarity Measures The default correlation is the Standard Correlation Standard correlation a b a b Minimum Distance and Separation Ratios Y AB i 1 YA EE i l i l To make a tree GeneSpring calculates the correlation for each gene with every other gene in the set Then it takes the highest correlation and pairs those two genes averaging their expression profiles GeneSpring then compares this new composite gene with all of the other unpaired genes This is repeated until all of the genes have been paired At this point the minimum distance and the separation ratio come in to play Both of these affect the branching behavior of the tree The minimum distance deals with how far down the tree discrete branches are depicted A value smaller than 001 has very little effect because most genes are not corre l
455. this setting specifies the direc tory in which sample files are located If a directory is not specified or does not begin with an absolute path the baseDirectory attribute of the lt GenomeMappingSpec gt tag is used Contents plain text Attributes n a Usage DataDirectory usr share affy DataDirectory Notes Optional lt FileNameMask gt FileNameMask is applied to all files in DataDirectory to filter the FileNames If the Data Directory is not specified or does not begin with an absolute path the baseDirectory attribute of the current lt GenomeMappingSpec gt section is used If baseDirectory does not begin with an absolute path the current user directory is used Contents plain text A 16 Installing from a Database Connecting your Database to GeneSpring Attributes n a Usage lt FileNameMask gt AffyChipID chip lt FileNameMask gt Notes Required for retrieving data from flat files in a directory lt IDFromFileName gt This allows you to generate sample IDs directly from file names If sample IDs are gener ated using lt Regexpmatch gt only one genome should be specified in the lt Genome Names gt lt GenomeMappingSpec gt tags The result of the lt RegexpMatch gt on the file names provides the sample IDs If you are using lt Dat abaseQuery gt instead the file names are passed as arguments to the specified SQL query Contents plain text Attributes n a Usage lt IDFromFileName gt lt IDFro
456. ticular example sets up an interface to the SASTM procedure FASTCLUS to do gene clustering You will need to create two text files with a text editor such as Microsoft NotePad These files are Runsas bat and Fastclus sas These are each described below 1 Create a batch file called runsas bat This batch file takes the standard input from GeneSpring stores it in a file executes SASTM and passes the results back to Gene Spring via standard output The program cat exe simply copies standard input into standard output If you do not have something equivalent on your system cat exe can be downloaded from the Silicon Genetics website Place the following text in the batch file echo o set infile 2 8 40 Scripts and External Programs External Programs set outfile cat exe 2 set SASROOT C PROGRA 1 SASINS 1 SAS V8 S SASROOTS SAS 1 sas nologo config SASROOT SASV8 CFG cat exe 3 del 1 1st 1 1og 2 3 Create a text file called astclus sas This batch file runs PROC FASTCLUS specifying 5 clusters In PROC IMPORT the datarow 3 command skips the first two lines of the exported data which contain the dataset name and one parameter If you have more than one parameter adjust the data row value accordingly PROC EXPORT puts a header line on the return data set listing the variable names and GeneSpring displays an error message and skips this line unless you have a gene named
457. tion Types Normalization Types Start with Pre Normalized Values This option is provided for backwards compatibility and allows you to maintain normal izations from a previous experiment It can be applied only to samples that were created by GeneSpring s Merge Split window or uploaded before GeNet 3 0 If you select this option when none of the data in your experiment have been previously normalized a mes sage alerting you to do this is displayed and the OK button is disabled Data Transformation SAGE Transform This method is recommended only for SAGE data It fills in zeroes for all genes not men tioned in your data file Real Time PCR Transform In this method doubling measurements are converted into measurements of mRNA con centration using the equation 2 Subtract background based on negative controls In this method the median value of the gene list is subtracted from the raw values for each gene The gene list used can be typed in or loaded from a file To type in a gene list simply enter a gene in each line of the text box provided Right click in this box to use the Copy and Paste options To load a gene list from a file click Load From File and select the gene list from the browse window that appears If there are already genes listed when you click Load From File the genes from the list you select are added to the existing list of genes Any genes you have already entered are not overwritten Note This t
458. tion from the Browse menu and click Save This does not remove the file from your list It simply places a copy of the file in a new location Delete File To remove an associated file select it in the list and click Delete View File Select a file name in the list and click View File to view the contents of the file in an external program The appropriate program to display is selected automat ically the file if the file type is known View Data File Format Click to display the column assignments for the selected file as it was loaded into GeneSpring The Graph Tab This view is available only if an experiment containing the current sample is selected It displays a graph of the raw sample data The Experiment Inspector Just as you can inspect a gene with the Gene Inspector window you can inspect experi ments with the Experiment Inspector window Accessing the Experiment Inspector 1 Right click over the name of any experiment in the navigator 2 Select Inspect from the pop up menu 4 16 Viewing Data Inspectors Experiment Inspector EIE Experiment Name Yeast cell cycle time series no 90 min Author s Type Your Name Here Research Group Silicon Geneties Demo User Itizy 1 Tue Jun 19 10 22 27 PDT 2001 yeasttimeseries txt column 4 Sample 4 j veasttimeseries txt column 5 Edit Parameters nspect Samp Figure 4 10 The Experiment Inspector window The upper section of the Experiment I
459. to the FASTA format although there are some differences The seq format consists of one line of identifiers followed by lines of sequence The identifier line consists of the Greater than sign gt followed by the chro mosome identifier followed by a space which is followed by an optional description An example is given here gt CHR1 This is the description of Chromosome 1 GCTGACGGACTTTCTAGCGGTCTAGCAACTGAGCGGCGCGCGGGCATCGTA CAGCAGCGAGCTACTATCTACGCGCGGCGGATATAAAACTACAAAAAAAAA Chromosomes in GeneSpring are given a number 1 2 3 etc and the number should be part of the Chromosome identifier The Chromosome identifier can optionally contain the letters CHR but is not required The number used in the seq format for the chromo some has to correspond to the number used in the Map position in the Master Table of Genes The seq format is not the same as the FASTA format There is an example of the FASTA formatat http www ncbi nlm nih gov BLAST fasta html Creating Genomes 2 7 The New Genome Installation Wizard A severely abridged example of the yeast seq file might look like this gt CHR1 Chromosome I data CCACACCACACCCACACACCCACACACCACCACCACACCACACCCACACACACA GTGGGTGTGGTGTGGTGTGTGGGTGTGGTGTGGGTGTGGTGTGTGTGGG gt CHR2 Complete DNA sequence of yeast chromosome II AAATAGCCCTCATGTACGTCTCCTCCAAGCCCTGTTGTCTCTTACCCGGA AGAATAGGGTACTGTTAGGATTGTGTTAGGGTGTGGGTGTGGTGTGTGTGGG TGTGGTGTGTGGGTGTGT gt CHR3 LOCUS SCCHRIII 3153
460. toff Multiple testing correction Restriction 2 way ANOVA One gene list Groups specification See 2 Way ANOVA on one experi 1st parameter Groups page 6 43 for details ment interpreta specification 2nd tion parameter Test type P value cutoff Multiple testing correction 2 way ANOVA One gene list Groups specification See 2 Way ANOVA on Specific Result one experi 1st parameter Groups page 6 43 for details Standardize Inputs Open Scripts gt Basic Scripts gt Standardize Inputs Name Input Knobs Description Use Specific None Experiment name Specify a single local experi Experiment ment by name This is useful as an input to another block Use Specific None Subfolders Experiment Specify a single local experi Experiment folder name ment folder by name This is Folder useful as an input to another block Scripts and External Programs 8 31 Script Building Blocks Name Input Knobs Description Use Specific None Gene list name Specify a single local gene list Gene List by name This is useful as an input to another block Use Specific None Subfolders Gene list Specify a single local gene list Gene List Folder folder name folder by name This is useful as an input to another block Scripts to External Programs External program building blocks are points of contact for other programs External pro gram building bloc
461. tter Plot view In the scatter plot above each symbol represents a gene The vertical position of each gene represents its expression level in the current condition and the horizontal position represents its control strength in this case the median expression level of this gene in all conditions Genes that fall above the diagonal are overexpressed and genes that fall below the diagonal are underexpressed as compared to their median expression level over the course of the experiment Viewing a Scatter Plot To view a scatter plot select the View gt Scatter Plot option The scatter plot view is the most flexible in its ability to customize the way that data are displayed Scatter Plot Display Options The following display options are available for this view e Vertical Axis The available options are listed below Horizontal Axis The available options are listed below Viewing Data 4 45 Scatter Plot View Features The available options are listed below Lines to Graph The available options are listed below Coloring The available options are listed below Error Bars See Error Bars on page 4 28 Legend See Legend on page 4 28 Vertical Horizontal Axes The most critical option to set is the type of data that is displayed on the two axes To modify the function as well as the appearance of the axes l Select View gt Display Options or right click anywhere in the genome browser a
462. ttus norvegicus Pig Sus scrofa Clawed frog Xenopus laevis e Thale cress Arabidopsis thaliana Barley Hordeum vulgare Rice Oryza sativa e Wheat Triticum aestivum Maize Zea mays The following is an example of how GeneSpring builds a homology table between Rat GenBank Accession Numbers and Mouse GenBank Accession numbers Rat GenBank Accession Number v GeneSpring looks up the corresponding Rat LocusLink ID and or UniGene Cluster ID V Homologene s table of homologues translates Rat LocusLink IDs and or UniGene Cluster IDs into Mouse LocusLink IDs and or UniGene Cluster IDs GeneSpring looks up the corresponding LocusLink ID and or UniGene Cluster ID Mouse GenBank Accession Number Below is an example of how GeneSpring makes a homology table between Affy Mu U74 GenBank Accession Numbers and Incyte Mouse GenBank Accession Numbers Affy Mu U74 GenBank Accession Number Y GeneSpring looks up the corresponding Mouse LocusLink ID and or UniGene Clus ter ID Y Do they match a GeneSpring looks up the corresponding Mouse LocusLink ID and or UniGene Clus ter ID M To use the Homology tool 1 Select a genome from the GeneSpring navigator Analyzing Data 6 25 Working with Gene Lists 2 Select Annotations gt Build Homology Tables The Build Homology Tables window appears The selected genome is displayed in the upper table on the right side of the screen Build Homology Table
463. tween major ticks in the Minor Ticks per Major Tick field Note that the number of visible tick marks between major ticks is one less than the number you enter 6 Click Apply Error Bars You have the option of using error bars in the Graph and Scatter Plot views To turn the error bars on right click in the genome browser and select Display Options Click the Error Bars tab The error bars are visible in the Gene Inspector as well as in the main GeneSpring window You can choose one of the following three kinds of error bars e Standard Error Standard Deviation Minimum Maximum Value of Each Gene In Figure 4 15 the example represents a k means clustering colored by expression values Note the list name and number of genes shown beneath each small screen In this instance the names are set numbers from the original k means clustering Legend You can specify what information to display in most views using the Legend tab on the Display Options screen 1 Select View gt Display Options 2 Click the Legend tab 3 Check the Show Legend box to display the information you specify Uncheck the box to show no text information 4 28 Viewing Data Display Options 4 Check or uncheck these options as desired 5 Click Apply Your changes are applied to the display in the main GeneSpring window The available options depend on whether they are applicable to the current view Legend Options Selected Object in Navigato
464. ublishing Default Directory Note that you will need to login to GeNet before using a script to autoload data to GeNet Open Scripts gt Basic Scripts gt GeNet Publishing gt Default Directory Name Input Knobs Description Send Classification One classifica None Publishes a classification to your default to GeNet tion directory in GeNet No output to Gene Spring Send Experiment to One experi None Publishes an experiment interpretation to GeNet ment interpreta your default directory in GeNet No out tion put to GeneSpring Send Condition Tree One condition None Publishes an condition tree to your to GeNet tree default directory in GeNet No output to GeneSpring Send Gene List to One gene list None Publishes a gene list to your default GeNet directory in GeNet No output to Gene Spring Send Gene Tree to One gene tree None Publishes a gene tree to your default GeNet directory in GeNet No output to Gene Spring Scripts and External Programs 8 23 Script Building Blocks GeNet Publishing Specific Directory Note that you will need to login to GeNet before using a script to autoload data to GeNet Open Scripts gt Basic Scripts gt GeNet Publishing gt Specific Directory Name Input Knobs Description Send Classification One classifica Directory Publishes a classification to a chosen to Directory in GeNet tion directory in GeNet
465. ues box Enter values for High Control Strength Medium Control Strength and Low Control Strength optional By default trust is shown on the colorbar To disable this select the Do not show trust on colorbar radio button To save these settings as the default for this experiment check the Save As Exper iment Default box If you leave this box unchecked your changes will affect the display options only in your current session Viewing Data 4 31 Display Options 8 Click OK Changing the Colorbar Range 1 Right click over the colorbar and select Set Coloring from the pop up menu 2 Select Color by Expression from the pull down menu 3 Reset the values determining the intensity of the colors used by the genome browser 4 Click OK There are six categories you can change High Expression High expression refers to the normalized expression of your genes it is the vertical axis of the color bar The default for this is 6 0 Normal Expression For most normalization procedures the data are normalized to 1 0 The default for this is 1 0 Low Expression For most normalization procedures the data do not have negative numbers The default for this is 0 0 For example you could change the usual range of an experiment to high 10 normal 5 and low 2 resulting in a very different color scheme once you click OK There is no Edit gt Undo Ctr1 2Z function for this type of change To return to your prev
466. uld have an equal number of replicates this is called a bal anced design GeneSpring can also perform 2 way ANOVA for proportional designs you will not be able to perform 2 way ANOVA for unbalanced designs which are not propor tional Performing a 2 way ANOVA will test for the effect of each parameter as well as the inter action between them simultaneously For each gene 3 p values are produced one for each parameter and one for the interaction term Genes for which any of the p values is less than the specified cutoff are returned From the results window several options for creating gene lists are available e 1 WayTests 2 Way Tests You can expect a false discovery rate of about 5 of the genes identified First Parameter to Test time minutes x Select Groups Manually Second Parameter to Test time minutes m Test Type Parametric test assume variances equal x P value Cutoff ros Multiple Testing Correction Benjamini and Hochberg False Discovery Rate v Figure 1 6 2 Way ANOVA options The following options are available First Parameter to Test Choose a parameter Second Parameter to Test Choose the second parameter to compare Test Type Specify whether to perform a parametric test assuming variances equal or a non parametric test P value Cutoff Default is 0 05 Multiple Testing Correction Available options are Bonferroni Bonferroni Step Down Holm Westfall amp Young Per
467. ull down menu at the top of the screen Choose a pair of samples and their control samples by checking the boxes next to the desired samples or typing them into the table below Copy and Paste functions will also work in this table Use commas or semicolons to separate samples or dashes to indicate a range of samples To select all samples in a column click Check A11 To unselect all samples in a column click Clear A11 To define a new pair use the Add Row button To remove a pair use the Delete Row button You can limit measurements based on a specified cutoff or by flag values If measurements are limited by flag values the percentile is calculated using only the genes that pass the flag restriction To limit measurements by flag values check the Use only measurements flagged box and select the appropriate option from the pull down menu The available options are Present Only Present or Marginal Anything but Absent If measurements are limited by a cutoff the percentile is calculated from all measurements above the cutoff This cutoff can be in either raw or partially normalized units The Raw Signal option means that the cutoff is applied to the raw measurements in the original data file These measurements are back calculated based on the previous normal ization steps Rounding errors may be introduced in this process 5 14 Normalizing Data Normalization Types Partially normalized means that the cutoff is applied
468. umber of occurrences in the list came about by chance Only nucleotide motifs with P values below the specified probability cutoff in this case 0 05 or 596 are shown Random Rate The intrinsic probability which is the percent of genes you would expect this specific nucleotide combination to appear upstream of if the nucleotide sequence were strictly random it is not of course but this is a good value to compare the observed probability to e Observed Other Genes The observed probability of this sequence motif appearing upstream of genes other than the list under inspection If the option Relative to sequence upstream of other genes is selected this becomes the probabil ity of the observed sequence occurring relative to the genes not in the list i e relative to the all genes list Ifthe option Relative to whole genomic sequence is selected this becomes the probability of one or more occurrences of the sequence based on the rate of occurrence in the entire genome The formula used to calculate this is 1 1 5 where k the number of occurrences in the whole sequence b the total number of bases n the length of the upstream region being searched Expected The number of incidences in the searched gene list in which you would expect this oligomer to occur The number for the Expected column is derived using the larger of the intrinsic probability and the observed probability values Single P this col
469. umn displays the Single P value for the motif This is the chance this particular sequence would be found if only one test was performed Tests The number of tests run to generate these motifs appears in the last column This is the number of oligomers tested that were the length of the sequence motif found Analyzing Data 6 22 File and List menus Working with Gene Lists Using the Conjectured Regulatory Sequence Window The Conjectured Regulatory Sequence window displays the common nucleotide sequence showing the 10 bases that precede and follow it in the area near or in each gene where the oligomer is found It also gives a brief description of the statistics listed in the Results box of the Find Potential Regulatory Sequences window and allows you to modify the observed motif by removing an item extending the promoter or making a new gene list Double click one of the sequence motifs given in the Results box of the Find Potential Regulatory Sequences window to view the Conjectured Regulatory Sequence window Conjectured Regulatory Sequence DEK ie List Details The sequence AAAAAAAA was observed upstream of 1 472 out ofthe 6 127 genes in the gene list called PCA component 1 Upstream means from 10 to 500 bases upstream of the gene Only exact matches were counted This was compared to the frequency 17 241 of that sequence upstream of other ORFs in the genome Yeast If the distribution of bases were random you would
470. ure 8 2 GeneSpring s Change Information window Using the Remote Server If you are using GeNet and GeNet is enabled with remote execution servers and there is at least one remote server installed and configured you can execute a script on a remote computer The results are returned to GeneSpring when the script completes Using a remote execution server is highly recommended when you are performing time consuming tasks such as clustering very large data sets The remote execution server can remove the burden on your local computer while it completes the necessary computation To send a script to a remote execution server select the Compute on a GeNet Server radio button on the Run Script window and click Start Your script is sent to an available execution server and either executed immediately or placed in a queue Scripts and External Programs 8 5 Scripts To view the status of a script that may be waiting in the queue select Tools gt Check Remote Execution Queue A window appears similar to the one in Figure 8 3 Remote Execution Queue opened from Yeast genome m ni xi Showing the queue on GeNet server inet v user administrator 1 remote server up How Job Name Genome Time Submitted Jime starea Time Finished 0001 k means Fri Aug 16 17 34 28 PDT 2002 Fri Aug 16 17 34 31 PDT 2002 Running Fause Resume Delete View Results View Script Refresh IV Notify me when results are available
471. urrent normalized values from the pull down menu 3 Enter the cutoff figure in the text box The default value is 10 0 You can choose to apply additional background correction in this step To apply back ground correction check the appropriate box in the Background Correction section of the screen You have the following options Never apply extra background correction Always apply extra background correction Prior to taking the specified percentile the bottom tenth percentile is used as a background correction and subtracted from all genes 5 10 Normalizing Data Normalization Types If needed apply extra background correction For samples in which the bottom tenth percentile is less than the negative of the specified percentile the tenth percentile is used as a background correction and subtracted from all genes before the specified percentile is taken Note Global Per Chip normalization is not recommended in any experiment where more than 50 of the genes on the chip are likely to be affected similarly by the experi mental conditions For example if a chip containing only known growth factors were used to study differential expression in malignant and benign tumors you might expect a majority of the genes to be differentially expressed In this case applying a per chip normalization would mask the changes in expression Normalize to Positive Control Genes Some chips come with positive controls mRNA from another
472. ust appear in at least out of conditions the number of conditions in the total experiment where genes must meet the specified requirements This line can refer to the whole experiment Analyzing Data 6 58 Basic Filters Filter on Confidence 5 Filter on Confidence jel FC Gene Lists HJ EC Choose Gene List gt gt like YMR199W CLN1 0 95 PCA Yeast cell cycle time Choose Experiment gt gt Yeast cell cycle time series no 90 min D HO PIR keywords Simplified Gene Ontology zi all genes Choose Measure of Confidence t test p value X H all genomic elements Choose Multiple Testing Correction Benjamini and Hochberg false discoven I5 ACGCGT in all ORFs like YMR199W CLN1 0 Filter Genes on t test p value Cross Gene Error Model is Active a Experiments 1 117 out of 117 genes pass filter Iv Interactive Update t cell cycle time seria 10 Default Interpretation HG All Samples ttest p value Normalized Intensity log scale 2 3 E 3 0 Maximum 1 View Graph Values must appear in atleast 8 out of 16 conditions Save Close Help Figure 6 16 The Filter on Confidence window R To filter on confidence 1 Select an experiment or condition from the navigator and click Choose Experi ment You can also select a subset of conditions within an experiment
473. vector eigenvector Calcu lating scores this way has the advantage of scaling them to be between 1 and 1 If you uncheck the Report scores as correlations box the PC scores repre sent the coordinates of the genes or conditions in the system defined by the first few prin cipal components In other words these scores are the values of the principal components for each gene or condition PCA on Genes To run PCA on genes 1 Select Tools gt Principal Components Analysis amp Principal Component Analysis DER Gene Lists d i PCA on Conditions EHA Experiments fan genomic elements 7 216 genes Yeast cell cycle time series no 90 min Default Interpretation mode Log v Report scores as correlations Computation Preferences Compute locally Compute on a GeNet RemoteServer Progress Local run time estimate Seconds Start Close Help Figure 7 10 Principal Components Analysis screen If it is not already selected click the PCA on Genes tab Select a gene list from the navigator and click Set Gene List Select an experiment from the navigator and click Set Experiment Un A W N Check or uncheck the Report scores as correlations box to specify whether to report scores as correlations or as the values of the principal components for each gene This box is checked by default 6 Specify whether to run the computation locally or on a GeNet Remote Server 7 Click Start 7 16 Clusterin
474. vels indexed by i Factor B has b levels indexed by j There are thus ab cells of data each containing at least one value Analyzing Data 6 45 Statistical Analysis ANOVA Standard Parametric 2 way ANOVA This test assumes equal variances and equal or proportional replication Case 1 equal replication All cells groups defined by combinations of the factor levels have the same number of replicates say r Thus there is a total of abr samples Let x represent the k replicate sample in level i of factor A and level j of factor B Let A sum of all observations in level i of factor A s gt EI i k Let B sum of all observations in level j of factor B ol i k Let AB ij sum of all observations in level i of factor A and level j of factor B 25 QE Let C abr We can now compute the various sums of squares terms Total sum of squares SS total EEE C i j k Factor A sum of squares L1 SSA er ee Factor B sum of squares 25 SS Bye S Ee ci ra Interaction sum of squares b3 5 A B SS AB L C S8 4 SSQB Error sum of squares a k a within group SS Analyzing Data 6 46 Statistical Analysis ANOVA SS error SS total SS A SS B SS AB Now compute the mean sums of squares MSs 4 S a i MSS B ZU MSS AB B XAB a 1 5 1 MSS error SS error abr ab Finally compute F ratios p MSS A MSS error
475. ver SOMs illustrate the relationship between groups by arranging them in a two dimensional map in addition to dividing genes into groups based on expression patterns SOMs are useful for visualizing the number of distinct expression patterns in your data and determining which of these patterns are variants of one another SOMs were invented by Tuevo Kohonen 1991 2000 and are used to analyze many kinds of data Applications to gene expression analysis were described by Tamayo et al 1999 GeneSpring s self organizing map algorithm begins by creating a two dimensional grid of nodes in the space of gene expression In each iteration one gene is selected and all of the nodes within a user defined neighborhood are moved closer to it This process is repeated with each gene in the selected gene list until the maximum number of iterations has been reached With each iteration the neighborhood radius is incrementally reduced and nodes are moved by smaller and smaller amounts to produce convergence In this way the grid of nodes is stretched and wrapped to best represent the variability of the data while still maintaining similarity between adjacent nodes After the iteration is complete genes are assigned to the nearest node and a display grid of gene expression graphs is generated corresponding to the initial grid of nodes As the iteration proceeds the neighborhood radius decreases smoothly so that points move more independently
476. w Display Options orright click anywhere in the genome browser and select Display Options Click the Coloring tab Specify whether to color conditions by gene parameter or attribute If you are coloring by parameter or attribute select the appropriate parameter or attribute from the pull down menu To change the default color click the Change button and choose the desired color Viewing Data 4 70 Showing Hiding Window Display Elements Showing Hiding Window Display Elements You have the option of showing or hiding many of the elements in the GeneSpring win dow To change the visibility of these elements select View gt Visible and choose one of the following options Picture Shows or hides the optional picture at the bottom right corner of the window Animation Controls Shows or hides the slider and the Animate check box at the bottom of the window hiding this check box does not disable the Animation feature Magnification Shows or hides the Magnification feature and the Zoom Out button at the bottom of the window hiding the Zoom Out button does not disable the Zoom Out menu option Secondary Picture Shows or hides your secondary picture when you are viewing two gene lists or experiments simultaneously in the genome browser Secondary Animation Controls Shows or hides the secondary Animation Controls check box and slider when you are viewing two gene lists or experiments simulta neously Navigator Sh
477. w The horizontal axis shows the correlation from zero to 1 The ver tical axis represents the number of genes The green lines are your specified maximum and minimum values If you change these values the green lines move accordingly If desired enter new values for Phase Offset and Weight You can select a parameter from the pull down menu in the phase offset section Click OK You are returned to the Find Similar Genes window You can now add addi tional experiments or conditions if you wish For more information on this portion of the screen see The Correlations table on page 6 10 To remove an experiment or condition click on the name of the experiment or condi tion in the white center box and click Remove To change the settings click the exper iment s name to select its row and click Edit Specify whether to run the search locally or on a GeNet Remote Server When you are done click OK The New Gene List window appears 10 Name your gene list and click Save The list appears in the Gene Lists folder of the main navigator Analyzing Data 6 9 Working with Gene Lists The Correlations table The Correlations table lists the experiments chosen to correlate against the specified gene The experiments selected may be weighted making one more important than another If both experiments chosen are given a weight of 1 they are averaged equally To modify an experiment s weight click in the Weight column to t
478. whose mapping data is at least partially available An illustration of what Physical Position View looks like for humans is given in Figure 4 24 For organisms already sequenced the phys ical position views looks more like yeast illustrated in Figure 4 23 Full GeneSpring Yeast Genes all genomic elements File Edit View Experiments Colorbar Filtering Tools Annotations Window Help 3 7 Gene Lists Chromosome Experiments HC Gene Trees HC Condition Trees C Classifications HO Pathways FC Array Layouts Expression Profiles HO External Programs HC Bookmarks FC Scripts Expression Base pair T T T 500000 1000000 1500001 Colored by time 0 minutes Gene List all genomic elements 7216 1 1 1 1 LU 1 1 1 1 1 enes Zoom C Zoom F Out Magnification 1 Figure 4 23 The Physical Position view At greater magnification you can see the base pairs Viewing Data 4 41 Physical Position View Full GeneSpring Human Oncogenes Genes all genes File Edit View Experiments Colorbar Filtering Tools Annotations Window Help Gene Lists EH Experiments Gene Trees L Condition Trees LJ Classifications LJ Pathways L Array Layouts L Expression Profiles External Programs L Bookmarks Scripts Expression Colored by Demonstration Experiment Default Interpretation Gene List all genes 159 Show All Genes Zoo
479. xperiment Condition Labels Displays experiment condition labels if they are available and space permits This option is available only if an experiment or condi tion tree is selected e Color Branches by Classification Color the tree branches based on classification This option is available only if a gene tree is selected To color by classification 4 56 Viewing Data Tree View B Check the Color Branches by Classification box b Select a classification from the Display Options minibrowser c Click Set Classification gt gt d Click Apply Unclassified genes are displayed using the background color e Color Branches by Experiment Parameter Color the tree branches based on experiment parameters This option is available only if a condition or condition tree is selected To color by experiment parameters a Checkthe Color Branches by Experiment Parameter box b Select an experiment from the Display Options minibrowser c Click Set Experiment gt gt d Select a parameter from the pull down menu To display a row of blocks at the bottom of the condition tree indicating their classi fication select the Show Coloring Blocks radio button e Click Apply Features Tab The Display Options window also includes a Features tab with the following options for reorganizing your tree Color by all conditions Divides the genes into sections representing multiple condi tions so that all conditions in the selected
480. xt or Numeric from the Data Type pull down menu Specify whether the attribute is Required Recommended or Optional from the This Attribute Is pull down menu Click OK You are returned to the Standard Attributes window To edit a standard attribute double click it in the Standard Attributes window or select it and click Edi t The Edit Attribute window appears This window looks exactly like the Add Attribute window Make any desired changes and click OK To delete a standard attribute select it in the Standard Attributes window and click Remove Working With Experiments 3 38 Experiment Interpretations Experiment Interpretations The Experiment Interpretation window allows you to determine how an experiment is to be displayed You can change the upper and lower bounds of the vertical axis of your graph the mode used to represent your data whether to turn on the cross gene error model how you want to view each parameter and which flagged measurements to be dis played Changing an experiment interpretation is useful not only for customizing initial display settings but also because statistical analysis techniques in GeneSpring are carried out based on how your data is characterized in the interpretation Because of this it can be valuable to set up more than one experiment interpretation then perform analyses on each one to compare the results of statistical testing on data that has been grouped and charac terized in dif
481. xternal program from the GeneSpring navigator and select Inspect amp External Program Inspector Load Classification From File Silicon Genetics Silicon Genetics marketing internal 37 Mon Jun 23 10 18 42 PDT 2003 Full GeneSpring 6 Directory Location Load Classification From File programdef C Program Files SiliconGenetics GeneSpring data Programs FileAccess jar Program Details Type of Executable Java class com sigenetics external examples GeneSpringExternalFromFile Command NIA Input to External Program Output from External Program No Data Classification View Program Details OK Run Close Help Figure 8 8 The External Program Inspector The top portion of the screen displays the following information Program Name Author s Research Group Organization Identifier Created Application Directory Location You can edit the Program Name Author and Research Group fields To modify these enter the new information and click OK The Program Details panel contains the following information Type of Executable The executable type i e Java class external program or URL Command The command line if the executable is an external program Input to External Program The inputs defined for the external program Output from External Program The outputs defined for the external program 8 42 Scripts and External Programs External Programs If a program cannot be edited such as the ex
482. y Note Checks file header to determine which channel is Signal and which is Control Name interpreted as Gene Name ch1 Intensity or ch2 Intensity interpreted as Signal ch1 Background or ch2 Background interpreted as Signal Background ch1 Intensity or ch2 Intensity interpreted as Control Channel ch1 Background or ch2 Background interpreted as Control Channel Background Clontech Atlas Image 2 Color Gene Code interpreted as Gene Name Intensity 2 interpreted as Signal Background 2 interpreted as Signal Background Intensity 1 interpreted as Control Channel Background 1 interpreted as Control Channel Background Protein gene interpreted as Description Column 11 interpreted as GenBankID Working With Experiments 3 14 Using the Column Editor Clontech Atlas Image 1 Color Gene Code interpreted as Gene Name Intensity 2 interpreted as Signal Background 2 interpreted as Signal Background Protein gene interpreted as Description Column 11 interpreted as GenBankID Working With Experiments 3 15 Creating New Experiments Creating New Experiments You can create a new experiment using existing data from either your local system from a GeNet server or both GeneSpring provides a variety of filters to make it easy to select the appropriate samples for your experiment The following sections describe the basic process of creating a new experiment followed by more detailed information on e
483. y Classification 1 Gene List all genes 112 C u I rood 1 f f 1 1 Trust SHOW ALL GENES ZOOM OUT ZOOM FULLY OUT Magnification 1 Figure 4 16 A split window In Figure 4 15 the example represents a k means clustering colored by expression values Note the list name and number of genes shown in the upper right corner of each small screen In this instance the names are set numbers from the original k means clustering To reach the split windows command right click over any item in the classification folder or any folder of classifications and move the cursor down to Split window A small pop up menu appears Select one of the options The main screen of the genome browser splits into several small screens Notice the number of genes beneath each small screen In addition clicking any classification automatically splits the window in both directions Note In the Eisen like subtree view the thumbnail of the full tree remains in its usual position but no marquee is shown specifying the subtree that has been zoomed in on Each classification shows the same subtree To unsplit the screen select View Unsplit window orright click over the original data object and select Split Neither You can also hide the labels appearing in the main window All of the Hide and Show commands are simple toggle switches Re select that option to show what has been hidden You may have to enlarge your screen before you can see all the labe
484. y Sequences The Find Potential Regulatory Sequences window appears 2 Click the Find New Sequence tab at the top of the screen Analyzing Data 6 19 Working with Gene Lists Find New Sequence enter a Specific Sequence Set Gene List PCA component 1 Number of Genes 6 127 6 127 have sequence Search Criteria Search before ORFs From 10 To 500 bases upstream of each gene From 5 To 8 length oligonucleotides Allow Ns in Regulatory Sequence From 0 To 0 single point discrepancies From 0 To 0 Ns in the exact middle Statistics Relative to upstream of other genes C Relative to whole genome Probability Cutoff 0 05 I Do local nucleotide density correction Figure 6 12 The Find New Sequence tab 3 Select a gene list from the navigator and click Set Gene List Note Do not choose the all genes or all genomic elements gene lists You are already comparing your selected gene list against all other genes in the genome 4 Enter the number of bases upstream of each gene in the Search Before ORFs sec tion of the window For example if you enter From 10 To 100 on a search for ACGCGT GeneSpring searches for any part of the promoter within the region between 10 and 100 The smaller the range between these numbers the greater the likelihood that the results are statistically significant Larger sequences may take longer to search You can also search for common sequences within th
485. y dependent normalization feature fits a curve through the data and uses this curve to adjust the control value for each measurement When the resulting nor malized data are graphed versus the adjusted control value the points are distributed more symmetrically around the 45 line see Figure 5 2 panel B You can specify the percent of data to be used for smoothing By default this value is 20 0 To counter the problem of taking logarithms of negative values in subsequent steps and to discount outliers the raw and control values for each spot R G are shifted by a con stant 5 8 Normalizing Data Normalization Types shift max median P min R Metas OG _ min G 0 The data goes through the following transformations G R XQG shift R shift XxXlog G shift log R shift 4 M A M FCA F loga log R X G R Gshift R shift AR shift where f A is the fitted function of the transformed data and the coordinates are trans formed by the operation yz 2 129 ep x The last step of the transformation is performed because normalizations in GeneSpring are accomplished by adjusting the control value and leaving the raw value unchanged The fit of the data f A is made using the LOWESS algorithm where the value f 0 2 is used for the fraction of the total data points used for smoothing at each point see References on page 5 21 for more information
486. y which conditions if any to exclude from the analysis 4 Exclude Conditions Check All Clear All Figure 7 13 The Exclude Conditions window 7 18 Clustering and Characterizing Data Principal Components Analysis By default all conditions are selected To exclude a condition uncheck the box to its left To include a condition check the box Click Check A11 to include all condi tions or Clear A11 to exclude all conditions To check or clear a range of condi tions select them in the list and click Check Selected or Clear Selected 6 Check or uncheck the Report scores as correlations box to specify whether to report scores as correlations or as the values of the principal components for each condition This box is checked by default 7 Specify whether to run the computation locally or on a GeNet Remote Server 8 Click Start PCA on Conditions Results When the analysis is complete the PCA Results window appears displaying each condi tion as a line in graph mode The significance of each condition is represented by the color of its graph line as defined by the colorbar Principal Components Analysis Conditions Results DE Conditions Yeast cell cycle time series no 90 min Default Interpretation Mode Log of ratio Gene List all genomic elements color Principal Component Percent Variance Principal Component 1 Split Window Principal Component 2 iu REA Display O
Download Pdf Manuals
Related Search
Related Contents
Information CAMÉRA IP WIFI ROTATIVE 1MP CAMWP08M Regency R90 User's Manual Barco MDCC-4130 Winegard RS-3000 User's Manual ポータブルナビ カタログ CN-GPA600FVD EDP Baseboard EDP-BB-4A User Manual Receptor/Controlador sem fios UC-216 Copyright © All rights reserved.
Failed to retrieve file