Home

Short Time-series Expression Miner (v1.3.8) User Manual

1. ololmi Inial lalo elololnio lal aelo o o 0 0 T Copy Table E Save Table Figure 34 The above window is an example of chromosome gene enrichment table The table contains a chromosome enrichment analysis for the set of genes in the displayed set Each row corresponds to a chromosome and has seven columns 1 Chromosome the ID of the Chromosome 2 4Genes Base Set how many genes were assigned to that chromosome in the base set of genes before any filtering 3 Genes Assigned the genes in the current set assigned to that chromosome 4 4Genes Expected the genes expected to be assigned to chromosome if a set of the same size as currently displayed was randomly drawn from the base set without replacement 5 4Genes Enriched the difference between the number of genes assigned and expected 6 p value the enrichment p value based on the hypergeometric distribution 7 Corrected p value the corrected p value for the enrichment using the same method as specified for the p value correction method for the Go Analysis either Bonferroni or Randomization 41 Table of Gene Locations AE Gene Symbol chromosome Strand Begin End CTMMALI z 110744677 110815630 DHE 216r9543 1702043 a MAIS 114741 MGN3 O FEET 80001174 DGFB ON ERETETT EL ERDEEL O CI O TEE FEE APOBEC3G fa cfr BAS 37813693 AI BT ENTEEEFT EAT 03S EEC EEE FA CO EEE EA MUTYH A OTF EFZS LC16A
2. 2 2 aa Chromosome Viewer Comparison K means t B Running STEM in Batch Mode C Using STEM for Standard Gene Ontology Enrichment Analysis D Gene Annotation Sources gt Ea FAP O on y UN E o S Es B D E Gene Location Sources Oo e e e 31 36 36 39 43 48 54 56 57 58 59 1 Introduction Welcome to STEM STEM is an acronym for the Short Time series Expression Miner a software program designed for clustering comparing and visualizing gene expression data from short time series microarray experiments 8 time points or fewer STEM implements a novel method for clustering short time series expression data that can differentiate between real and random patterns STEM is also integrated with the Gene Ontology GO 4 allowing efficient biological interpretations of the data 1 1 STEM Clustering Method Overview The novel clustering method that STEM implements first defines a set of distinct and representative model temporal expression profiles independent of the data These model profiles correspond to possible profiles of a gene s change in expression over time The model profiles all start at 0 and then between two time points a model profile can either hold steady or increase or decrease an integral number of time units up to a parameter value Gene expression times series are transformed to start at 0 and each gene is assigned to the model profile to w
3. Yes if the gene of the row is part of a category or gene set by which the profiles are ordered otherwise the field is empty Weight This field represents the weight of the assignment of the gene to the profile If the profile the gene most closely matches is unique then the value is one If there is a tie as to which profile a gene most closely matches then this value is one divided by the number of profiles a gene most closely matches Gene Symbol This column contains the gene symbols The name for this column is read from the header in the data file Spot ID An entry in this column contains a list of spot IDs of spots which contain the gene of the row delimited by a The header for this column is read from the data file if spot IDs are included in the data file Time Point columns The time series of gene expression levels for the gene after any selected transformation Log normalize data Normalize data or No normalization add 0 The header for these columns are read from the data file As with all tables in STEM this table can be sorted in ascending or descending order by any column by clicking on the column header A user can also save the entire table using the Save Table button or just the gene names using the Save Gene Names button Likewise a user copy the entire table to the clipboard using the Copy Table button or just the gene names using the Copy Gene Names button 5 2 Gen
4. We Gene Name In Gene Set BTG2 200RF16 200RF18 4A ANXAS ARP LON OXTAZ BA CL4 LDN14 LDN4 FFA PYSL2 CE1 ROIL PR1 UT4 NA15 L23A L2RG 5620 NID ABPS GF18 LJ20920 LJ23311 LJ90586 a o II ME SA CARP CLEN o COXA ss CS AAA TS DEFA DOBLE CI FABPS gt gt gt CTI A FLJ20920 TAS FLJ90586 o GATM GSA o HPCL2 Y Profile ID in Comparison Set gH Select Genes Profile ID in Original Set 41 Select Genes Query Gene Set amp Load Gene Set Save Gene Set Query Gene Set 3 Load Gene Set Save Gene Set Figure 40 Dialog windows to define gene sets The dialog window on the left is used to define a gene set to reorder model profiles from the original data set while the dialog on the right is used to define a gene set to reorder model profiles from the comparison data set 4T 8 K means In addition to providing a novel clustering method designed for short time series expression data 3 STEM also provides an implementation of the standard K means algorithm for clustering To use the K means clustering algorithm in STEM select K means under Clustering Method Figure 41 The K means clustering algorithm partitions genes into K sets 51 53 Sk where K is an input parameter provided by a user in the field Number of Clusters K Each set S has a center c associated with it where the center represents the mean of all genes assigned to the set S After tra
5. 277 assigned to profile 38 in the 38 in the first experiment first experiment that were also assigned to profile 13 in the second experiment p value for the of genes in the intersection Figure 38 A legend for the comparison interface The bottom left corner of profile boxes to the left of the yellow bar contain the number of genes assigned to the profile The bottom left corner of profile boxes to the right of the yellow bar contains the number of genes assigned to the profile that were also assigned to the profile to the immediate left of the yellow bar and then separated by a semicolon the p value for seeing this many or more genes in the intersection The upper right hand corner of the profile boxes to the right of the yellow bar contains the correlation with the profile to the left of the yellow bar On the bottom of the comparison window are four yellow buttons which are used to rearrange the profile boxes on the main window These buttons function as follows e Swap Rows and Columns Interchanges which data set is to the left of the yellow bar and which is to the right of the yellow bar e Order By Profile ID This button returns the profile pairs to their default ordering By default the profiles to the left of the yellow are first ordered by increasing ID Profiles to the right of the yellow bar are then ordered within the row by increasing ID e Order By Significance This reorders profile pairs based on statistica
6. If included any genes listed in the file will be considered part of the initial base set of genes during a Gene Ontology GO enrichment analysis in addition to any genes included in the data file Using this file thus allows one to filter genes from the data by a criteria not implemented in STEM by excluding them from the data file but still include the filtered genes as part of 10 the base set of genes during a GO enrichment analysis If genes appear in both Pre filtered Gene File file and the data file then the gene will only be added to the base set once The format of this file is the same as a data file except including the time series expression values is optional and if included they will be ignored As with a data file if the field Spot IDs included in the data file is checked then the first column will contain spot IDs and the second column will contain gene symbols otherwise the first column will contain gene symbols 3 3 2 Model Profile Options Advanced Options Filtering Model Profiles Clustering Profiles Gene Annotations GO Analysis Maximum Correlation Maximum Number of Candidate Model Profiles 1 000 000 Number of Permutations per Gene 0 for all permutations Significance Level 0 0 Permutation Test Should Permute Time Point 0 Correction Method 8 Bonferroni gt False Discovery Rate None oy Figure 7 The above panel is used to specify options for selecting model profiles and assess
7. If the Minimum Correlation parameter is set to 1 then only the Minimum Correlation Percentile parameter value will influence the clustering of model profiles Similarly if the Minimum Correlation Percentile parameter is set to 0 then only the Minimum Correlation parameter value will influence the clustering of model profiles 3 3 4 Gene Annotations Options On the fourth panel shown in Figure 9 a user may specify options related to gene annotations The first three options allow one to filter annotations when the annotation file is in the official 15 column format The last field the Category ID mapping file is useful in the case in which genes are annotated as belonging to a category outside the Gene Ontology The options on this panel are as follows e Only include annotations of type Biological Process Molecular Function Cellular Component These three checkboxes allow one to filter annotations that are not of the types checked These three checkboxes 13 Advanced Options Filtering Model Profiles Clustering Profiles Gene Annotations GO Analysis Only include annotations of type Biological Process Molecular Function Cellular Component Only include annotations with these taxon IDs Exclude annotations with these evidence codes Figure 9 The above panel is used to specify options related to gene annotations only apply if the annotations are in the official 15 column GO format in which case the annot
8. actual size based enrichment uncorrected p value 4 8E 11 18 0 130 0 vs 387 20452 15 5 genes Figure 26 The window on left is an example of a model profile detailed information window that appears when the profiles are sorted based on enrichment for a GO category in this case the cell cycle The window on right is the same window after a user clicks Click to plot only profile cell cycle genes Pressing the Profile cell cycle Gene Table displays a table of genes assigned to the profile that are also cell cycle genes Click to plot all profile genes button right side of F igure 28 which if pressed again will revert the window back to its original state Profile 43 Profile 43 0 1 3 3 2 150 0 Genes Assigned 28 5 Genes Expected p value 0 00 significant Query set profile enrichment uncorrected p value 4 2E 5 8 0 150 0 vs 173 20452 6 7 genes 5 Expression Change v v 0 Figure 27 An example of a model profile detailed information window that appears when the profiles are ordered based on query set enrichment 34 Profile 9 Profile 9 Profile 9 0 1 2 3 4 Profile 9 0 1 2 3 4 3 Expression Change 3 Expression Change v i v 0 130 0 Genes Assigned 34 7 Genes Expected p value 0 00 significant v i v 0 130 0 Genes Assigned 34 7 Genes Expected p value 0 00 significant Figure 28 If the model profile window is opened from the main gene table then there
9. at One of the 41 data sets is the EBI UniProt set Subsets of this data set with annotations specific to a large number of species can be found at and are not included in the list of 41 data sets If one of the 41 data sets is selected then the annotation file corresponding to the source will appear in the Gene Annotation File text box uneditable If User provided is selected then the Gene Annotation File text box will become editable and a user can specify a gene annotation file Selecting No annotations is equivalent to selecting User Provided and leaving the field empty A gene annotation file can be in one of two formats 1 The gene annotation file can be in the official 15 column gene annotation file format described at All 41 of the data sets provided by Gene Ontology Consortium members are in this format If the file is in this format any entry in the columns DB_Object_ID Column 2 DB_Object_Symbol Column 3 DB_Object_Name Column 10 or DB_Object_Synonym Col umn 11 will be annotated as belonging to the GO category specified in Column 5 of the row If the entry in the DB_Object_Symbol contains an underscore _ then the portion of the entry before the underscore will also be annotated as belonging to the GO category since under some naming conventions the portion after the underscore is a symbol for the database that is not specific to the gene The DB_Object_Synonym column may have multiple symbols delimited by either a semicolo
10. semicolon is the number of genes the profile is enriched for and is computed as A B x E If the cluster of profiles are reordered based on a category or a user defined gene set then an additional line appears below the profile enrichment line about the category enrichment computed based on the set of genes assigned to any profile in the cluster Along the bottom of the window are several yellow buttons Which buttons appear will depend upon how the profiles are ordered through which interface the window was opened and whether the profile is part of a non singleton cluster of profiles However every window will contain a Profile Gene Table and Profile GO Table button The Profile Gene Table button displays a table with all the genes assigned to the profile A gene table is described in Section 5 1 The profile Profile GO Table brings up a table with gene category enrichments among genes assigned to the profile A gene category enrichment table is described in Section 5 2 If a profile is part of cluster of profiles which is not a singleton then two additional buttons will appear along the bottom row of the window the Cluster Gene Table and Cluster GO Table buttons The Cluster Gene Table button displays a gene table that includes all genes assigned to any profile in the cluster of profiles to which the profile belongs The Cluster GO Table button displays a gene enrichment table that is based on enrichment for all genes assigned to any pro
11. significance associated with a profile appearing to the left or right of the blue bar If the vertical text label on the left side of each half read Original Set Profiles then to the immediate left of the vertical yellow bar are profiles all of which are from the original data set If the vertical label reads Comparison Set Profiles then the profiles to the left of the yellow bar are all from the comparison data set To the right of the yellow bars are profiles from the other data set If the profiles to the right of the yellow bar are from the comparison experiment then the horizontal labels on the top of the screen will read Comparison Set Profiles while if the profiles to the right of the yellow bar are from the original experiment then the horizontal labels will read Original Set Profiles A profile appears to the right of the yellow bar if the intersection of the set genes assigned to it and the profile to the immediate left of the yellow bar satisfy the size and p value constraints specified on the comparison dialog The legend that appears when a user presses the help icon information appears in Figure and explains what the various numbers mean on the profile boxes This window as with the main profile screen is zoomable and pannable Instructions for zooming and panning can be found in Section 4 6 Clicking on a profile box to the right of a yellow bar launches a detail model profile window that includes the option to obtain
12. specific O Global Scale should be based on only selected genes CI Show Main Y axis gene tick marks Main Y axis gene tick interval Y axis scale on profile details windows should be Determined automatically gt Fixed with parameters below Min Max lick interval X axis scale should be O Uniform Based on real time Figure 21 The window to adjust interface options that appears when pressing the button Interface Options After pressing the Interface Options button on the main interface a window such as in Figure 21 appears This window is divided into four sections The options in the first section control the display policy on the main interface not specific to the x or y scale In this section one of three options can be selected pertaining to the gene display policy e Do not display genes individual gene expression profiles are not shown on the main interface 26 e Display only selected genes individual gene expression profiles are only displayed when ordering the profiles or cluster of profiles by a GO category or gene set see Figure 22 bottom right In this case only genes which belong to the selected GO category or gene set by which the ordering is based is displayed on the main interface e Display all genes all genes not filtered are displayed see Figure 22 top left top right and bottom left If Display only selected or Display all is selected then th
13. specifies the number of random samples that should be made when computing multiple hypothesis corrected enrichment p values by a randomization test A randomization test is used when the p value enrichment is based on the actual size of the set of genes and Randomization is selected next to the Multiple hypothesis correction method for actual sized based enrichment label The Bonferroni correction is always used when the p value enrichment 15 is based on the expected size of the set of genes The difference between actual and expected size enrichment is discussed in Section 4 3 Increasing this parameter will lead to more accurate corrected p values for the randomization test but will also lead to longer execution time to compute the values Multiple hypothesis correction method for actual sized based enrichment This parameter controls the correction method for actual size based GO enrichment Expected size based p values are always corrected using a Bonferroni correction See Section 4 3 for a discussion on the differences between actual and expected size enrichment analysis The parameter value can either be Bonferroni or Randomization If Bonferroni is selected then a Bonferroni correction is applied where the uncorrected p value is divided by the number of categories meeting the mininum Minimum GO level and Minimum number of genes constraints If Randomization is selected the corrected p value is computed based o
14. two channel cDNA array and an experiment was conducted at time point 0 then the Normalize data option should be selected In this case after normalization the transformed values will represent the log change ratio versus time point 0 If the input data file already contains log ratio data against a control but no time point 0 experiment was conducted then the No normalization add 0 option should be selected In this case the assumption is made that had a time point 0 experiment been conducted the expression level in both channels would have been equal Pressing the Repeat Data button brings up an interface as shown in Figure 4 The Repeat Data button on the main input interface is yellow if there is currently one or more repeat data files specified otherwise it is gray Repeat data files must have the same format as the original data file including the same number of rows and columns Repeat data values will be averaged with the values from the original data file using the median Repeat data can be selected to be from either Different time periods or The same time period If the data is from Different time periods then data was collected over multiple distinct time series but presumably at the same sampling rate If the data is from The same time period then this implies multiple measurements were collected at each time point during one time series If the repeat data is selected to be from the The same time period then the file to w
15. 1 lt 0 001 lt 0 001 lt 0 001 lt 0 001 0 001 lt 0 001 lt 0 001 lt 0 001 lt 0 001 lt 0 001 lt 0 001 lt 0 001 0 001 0 001 lt 0 001 0 001 13 9 13 2 13 2 9 3 20 7 6 7 20 6 8 1 25 7 17 0 22 5 8 0 22 4 8 0 22 2 19 5 19 5 21 2 5 6 20 9 19 2 47 20 9 11 2 330 288 289 16 0 15 0 15 0 10 0 30 0 7 0 31 0 9 0 53 0 25 0 42 0 9 0 43 0 9 0 44 0 35 0 35 0 41 0 6 0 42 0 37 0 5 0 50 0 16 0 60 0016740 Click for GO Results Based on the Profile s Expected Size B Copy Table Save Table Figure 30 A gene enrichment analysis table Clicking on a row of the table brings up a gene table that includes only the genes annotated as belonging to the category of the row that are also in the set being analyzed The above table is enrichment based on the actual size of the profile Clicking on the button Click for GO Results Based on the Profile s Expected Size opens another table with GO results computed based on the expected size of a profile file Figure 30 shows an example of such a table As discussed at the beginning of the section the exact set of genes that the enrichment analysis is for depends upon which button was pressed to bring up the table For a category to appear in the table the number of genes in th
16. 1 EPOC fitaa sa8r o 113300498 TP2B4 201862312 201984201 OL4A3BP O LEFT BABI LC2A1 BATTS SERTE ATM 4401S 458272 2111173 6604107 AA SUSE A EAS hi bP mo Po 1 ME of Per 205 7191040 54790204 4559168 405393 DGFRA YBBP14 RPPRE 4076648 SMB 0 1232596 EI Dr 0789740 NHP2Lt IBARRA Copy Table Save Table Copy Gene Names Save Gene Names OA gt bo POGFRA Moo Figure 35 The above is an example of a chromosome viewer gene table Each row corresponds to a gene currently displayed in the above viewer The table has five columns 1 the gene ID 2 chromosome of the gene 3 the strand of the gene 4 the beginning position of the gene along the chromosome 5 the end position of the gene along the chromosome 42 7 Comparison Comparison Data File vaca bd View Comparison Data File J Comparison Repeat Data Maximum uncorrected intersection p value Minimum number of genes in intersection Compare Figure 36 The comparison dialog box which is used to specify a comparison data set and parameters for gene set intersections of interest Pressing the Compare opens two new windows one is a model profile overview for the comparison data set and the other is the main comparison window STEM facilitates the comparison of gene expression data sets from two different experimental conditions and in
17. 252 0 177 Fe USP2 O67 0 709 0 347 0779 0 403 5 DSCRIL1 0923 051 0718 0512 0 668 6 WNT5A 0 471 0 264 0 269 0154 0 254 7 VHL 0327 0 378 0229 0 264 0 072 E TCF3 0 021 0129 0 209 0 245 0 036 9 TCN2 0492 041 0306 0494 0 273 10 TIMP1 0 111 0351 0168 0 129 0 293 11 SERPINA7 0 468 0 488 0 199 0144 0 185 Figure 2 Above is a sample input data file when viewed in Microsoft Excel The first column shown in yellow contains spot IDs and is optional If the column is included then the field Spot Ds included in the data file on the input interface must be checked otherwise the field must be unchecked and the first column contain gene symbols The columns containing the time series of gene expression values come after the gene symbol column The sample data in this figure and throughout the manual comes from an Bh 12h 0 169 0 193 0 165 0 134 IMP 1 Uii 0 351 0 168 0 129 0 293 ERFIMAT 0 468 0 468 0 199 0 144 0 185 0 229 0 264 O 0F2 on O G z Sl Figure 3 A sample input data file displayed in a table after the button View Data File on the input interface was pressed the data file corresponding to the same gene appearing on multiple spots on the array Expression values for the same gene will be averaged using the median before further analysis on the data is conducted A sample data file as it would appear in Microsoft Excel is shown in Figurel2 The first column which appears in yellow is
18. 4 1 4 3 Ordering Profiles El Order Profiles by Category ID Category Name Min p value Min p value actual size expected size GO 0007049 cell cycle GO 0022402 cell cycle process GO 0006259 DNA metabolic process GO 0006260 DNA replication 60 0006139 nucleobase nucleoside nucleotide and n 4 Go 0000074 regulation of progression through cell cycle GO 0051 726 regulation of cell cycle GO 0005634 nucleus GO 0006261 DNA dependent DNA replication GO 0044238 primary metabolic process GO 0044237 cellular metabolic process GO 0043283 biopolymer metabolic process 60 0051301 cell division GO 0005622 intracellular GO 0044424 intracellular part AO D00A771 DMA ration initiati Order using enrichment p values based on a profile s 8 actual size expected size Profile ID Significance Number of Genes Expected Number Default Order Define Gene Set B Copy Table Save Table Figure 15 The dialog box through which the profiles can be reordered on the model profile overview interface Clicking on a row of the table reorders the profiles by gene enrichment for genes of that category An important feature of STEM is its ability to easily reorder model profiles on the overview screen by a number criteria including the p value of gene enrichment for any Gene Ontology category or a user defined gene set To reorder the profiles first press the button Orde
19. 60 0004051 60 0004854 60 0005469 560 0016491 60 0030151 60 0006119 60 0006500 560 0006954 OBP2B G0 0005215 G0 0005549 50 0000004 60 0006810 60 0007608 60 0007635 60 0008372 8 OBP24 GO 0005215 G0 0005549 G0 0000004 GO 0006810 G0 0007608 GO 0008372 9 PNLIP GO 0004606 60 0016757 GO0 0006641 G0 0016042 STKE G0 0004674 60 0005524 60 0016740 GO 0006468 GO 0007049 G0 0007067 G0 0005634 G0 0005819 Figure 5 Annotation file in a two column format The first column contains gene symbols or spot IDs while the second column contains category IDs Annotation files can also be in the official 15 column format 3 2 Gene Info In the second section of the interface a user specifies the gene annotation information Both gene symbols or spot IDs can be annotated as belonging to an official Gene Ontology GO category or a user defined category If a gene is annotated as belonging to an official category in the Gene Ontology then it will automatically also be annotated as belonging to any ancestor category in the ontology hierarchy The first field in this section of the interface is the Gene Annotation Source This field can be set to either User provided No annotations or one of 41 annotation data sets provided by Gene Ontology Consortium members A full list of the 41 data sets can be found in Appendix D More information about these annotation sets can be found at and for the annotation sets provided by the European Bioninformatics Institute EBI also
20. CC13 Profile ID in Comparison Set 257 Select Genes Query Gene Set Load Gene Set Save Gene Set Figure 18 Dialog window through which to specify a user defined gene set Genes which are checked are part of the set 24 4 4 Ordering Clusters of Profiles Order Clusters and then Profiles by ME Category ID Category Name Min p value 60 0007049 cell cycle 3 2E 12 60 0006259 DNA metabolism 3 4E 10 Go 0000074 regulation of progression through cell cycle 4 7E 10 GO 0051 726 regulation of cell cycle 4 9E 10 60 0006260 DNA replication 1 4E 9 60 0006139 nucleobase nucleoside nucleotide and nucleic 6 9E 9 60 0006261 DNA dependent DNA replication 2 1E 8 60 0005634 nucleus 2 4 8 60 0006281 DNA repair 2 0E El 60 0050875 cellular physiological process 2 4E 7 60 0043283 biopolymer metabolism 3 5E 7 60 0044238 primary metabolism 5 2E 6G0 0006974 response to DNA damage stimulus 5 6E 7 60 0044424 intracellular part 9 0E 7 60 0044464 cell part 9 4E 60 0009719 response to endogenous stimulus 8 5E 7 60 0005622 intracellular 1 5E 6 60 0043231 intracellular membrane bound organelle 2 3E 6 G0 0043227 imembrane bound organelle 12 3E J 6 Defa
21. STEM Short Time series Expression Miner v1 3 8 User Manual Jason Ernst jernstQcs cmu edu Dima Patek Ziv Bar Joseph School of Computer Science Carnegie Mellon University Contents 1 Introduction 1 1 STEM Clustering Method Overview 0 a 12 Manual Overview s s c s sa tb AKA OA aered ERE ORR AAA 13 Citing STEM oaa aaa deo ood ease deb ee ene e8 shee Shee ee bebe Preliminaries Input Interface Oo Expression Data nloj ccoo dioses ee eh babe aes aaa ss ES ees Oe ene NO baw kee ee ke Re ee ee ee ea eee eee Bee eee Be Seabee eee neon ae Hae AAA ee eaeeeean ee ea ee wae oe 3 3 1 Filtering Options lt oS oe ee oo eR KOE SMD Bi eee Se RG A coeeeeheeeteeeereeeeeeeee eer ee ee ese eee ee eee ee Seupe eee Beat aeeaee eee eae eee eae se ee eee ee eee aa ars Seas eee oeeseeee eee eee eases eeu se eee caes osas eee ee ee oe eee eee ee ee ee ee Model Profiles Overview Interface del Main Gene Table x ss crosd 6 eee SE See Eee ee eee Oe ee Ads 42 Filtered Gene List 4 4c 266 e668 4 be ww ede BE eee Eww ass as 4a Ordering PODES rarere neret ea a as Ew ERE Ewe eS Ge 4 4 Ordering Clusters of Profiles 2 a a a a a 4 5 Interface Options 84 soe ew he ao HERDER Ee SRO AA Oe EES 46 Zooming and Pannine e e seri sera saarra eS SAEED ERE SE OEE E SORE ESE EE EES is 5 Model Profile Details Interface Dl Gene Table 3 Bo SERRA BEEP AAA AA 5 2 Gene Enrichment Analysis Table
22. all evidence codes are assumed to be acceptable Evidence code symbols are IEA IC IDA IEP IGI IMP IPI ISS RCA NAS ND TAS and NR Information about GO evidence codes can be found at Note that this field only applies if the gene annotations are in the official 15 column GO annotation format The evidence code is the entry in column 7 For example to exclude the annotations that were inferred from electronic annotation or a non traceable author statement the field should contain EA NAS e Category ID mapping file This file which is optional specifies a mapping between gene category IDs and category names for categories which are not official Gene Ontology categories The mapping between IDs and names for official GO categories are defined in the file gene_ontology obo If a category ID appears in the gene annotation file but does not correspond to an official Gene Ontology category and is not defined 14 in a Category ID mapping file then the category ID is used in place of the category name A category ID mapping file has two columns delimited by a tab The first column contains category IDs and the second column contains category names Each line defines a mapping between one category ID and names Below is a short sample file ID_A CategoryNameA ID_B CategoryNameB ID_C CategoryNameC 3 35 GO Analysis Options The final advanced options panel shown in Figure 10 controls options related to Gene Ontology GO en
23. ameters respectively Additionally if Fixed with parameters below option is selected the desired tick mark interval can also be specified through the Tick interval parameter The final section has one option If this option is set to Uniform all time points are placed at uniformly spaced intervals on the x axis on both the main interface and the profile details windows If this option is set to Based on real tume time points are placed on the x axis proportionally spaced according to the real time points given in the column headers see Figure 23 The time points need to be in the same units If STEM was unable to parse the time points then only the Uniform option is active 28 EZ All STEM Profiles 1 SEE All STEM Profiles 1 DER Clusters ordered based on number of genes and profiles ordered by significance default Clusters ordered based on number of genes and profiles ordered by significance default Su SSSSS2RA 5 AAA UNA Ne at A SE ALMA AMAR AC ALAVA Fitered Gene List Main Gene Table intertace Options Order Profiles By Order Custers By Compare E Fittered Gene List Main Gene Table Interface Options Order Profiles By Order Clusters By Compare E YY ES All STEM Profiles 1 ES All STEM Profiles 1 Clusters ordered based on number of genes and profiles ordered by significance default Profiles ordered based on the actual size based p value of gene enrichment of GO 0007049 c
24. an Display K Means Cluster ID Display details when ordering Y axis scale for genes on K means main interface should be Cluster specific Global Scale should be based on only selected genes _ Show Main Y axis gene tick marks Main Y axis gene tick interval H Y axis scale on cluster details windows should be Determined automatically gt Fixed with parameters below Min H Max H Tick interval H X axis scale should be O Uniform Based on real time Figure 44 Above is the inteface options window similar to as in Figure 21 except the Y axis scale can only be Cluster specific or Global Category Name Min p value actual size DNA replication DNA metabolism cell cycle regulation of progression through cell cycle 3 1E 9 regulation of cell cycle 3 3E 9 nucleobase nucleoside nucleotide and nu cytoskeletal protein binding DNA dependent DNA replication nucleus collagen catabolism peptidoglycan metabolism ribosome biogenesis and assembly biopolymer metabolism Cluster ID Number of Genes Define Gene Set 3 Copy Table Save Table Figure 45 Above is the window to order K means clusters Clusters can be ordered based on ID number of genes or relevance to a GO Category or user defined gene set ol Cluster 7 DEK Cluster 7 0 0 0 9 0 8 0 7 0 8 Expression Change i vi v 0 548 0 Genes Assigned Figure 46 Above is an
25. as Figure 46 with detailed information about a K means cluster similar to the model profile detailed interface described in Section 5 From this window one can open a table of all genes assigned to the cluster as one could do for all gene assigned to a STEM profile described in Section 5 1 48 EN STEM Short Time series Expression Miner E mE umanes hm TO A gene_association goa_human qz Browse M Ta monse m E a E Figure 41 Above is the main input interface described previously in Section 3 with the clustering method set to K means Two parameters appear when K means is selected that do not appear when the STEM clustering method is selected These two parameters specify the number of clusters and the number of random starts Similarly one can open a table with GO analysis results for the set of genes assigned to the cluster as one could do for all genes assigned to a profile described in Section 5 2 The GO analysis can only be based on the actual size of the cluster since there is no notion of the expect sized of a K means cluster Pressing the Main Gene Table on the main K means interface is the same as described in Section 4 1 for the STEM clustering method except the table has the cluster the gene was assigned to instead of the profile The Filter Gene Table is identical to that described in Section 4 2 Comparison for K means works the same way as described in Section 7 except STEM profiles are replaced with K m
26. ation type is determined by the entry in the Aspect field Column 9 An entry of P in the Aspect field means the annotation is of type Biological Process an entry of F means the annotation is of type Molecular Function and an entry of C means the annotation is of type Cellular Component e Only include annotations with these taxon IDs Some annotation files contain annotations for multiple species and it might be desirable to use only annotations for certain species To use only annotations for certain species enter the taxon IDs for the desired species delimited by either commas semicolons or pipes If this field is left empty then any specie is assumed to be acceptable More information about taxonomy codes and a search function to find the taxon code for a species can be found at Note that this parameter only applies when the annotations are in the official 15 column format The taxonomy ID in the annotation file is in column 13 of the file and the taxon IDs entered in this parameter field must match the entry in column 13 or match after prepending the string taxon to the ID For example to use only annotations for a Homo sapien the string 9606 can be used e Exclude annotations with these evidence codes This field takes a list of unacceptable evidence codes for gene annotations delimited by either a comma semicolon or pipe If this field is left empty then
27. ave As menu To view the contents of the data file from the interface press the button View Data File and then a table such as in Figure 3 will appear Before gene expression time series are matched against model temporal expression profiles the time series must be transformed to start at 0 The transformation that is used to do this can be selected to be of one of three types Log normalize data Normalize data or No normalization add 0 Given a time series vector of gene expression values vo U1 V2 Un the transformations are as follows e Log normalize data transforms the vector to 0 loga loga 2 log2 72 e Normalize data transforms the vector to 0 v1 Vo V2 VO Un Vo e No normalization add 0 transforms the vector to 0 vo U1 Va Un It is recommended that after transformation a time series represent the log ratios of the gene expression levels versus the level at time point 0 Time point 0 usually corresponds to a control before the experimental conditions were applied If the input data file contains raw expression values as from an Oligonucleotide array then the Log normalize data option should be selected If any values are 0 or negative and the Log normalize data option is selected then these values will be treated as missing If the input data file already represents the log ratio of a sample against a control as is often the case when the data is from a
28. based on their chromosome location an example of which is shown in Figure Pressing the Chromosome View button in a gene table window causes the chromosome viewer window to appear and any genes listed in the table to be displayed on the chromosome viewer If the Chromosome View button is pressed from multiple tables then all of these genes are displayed in the same window The set of genes in the window thus accumulates for each time the Chromosome View button is pressed until the Clear Genes button is pressed On the chromosome viewer window each rectangular box corresponds to a chromosome and the small lines in these boxes correspond to genes The top half of the box corresponds to its positive strand and the bottom half its negative strand When a Chromosome View button on a gene table is pressed all genes on the table are plotted based on their location on the chromosome The window accumulates genes until the Clear Genes button is pressed Mousing over a gene gives its ID Clicking on a gene opens the Ensembl genome browser to that gene Along the bottom of the window there are several buttons that function as follows e Next Gene Color gives the option to change the color of the next genes displayed Note that this does not change the color of the currently displayed genes The color with which the genes will be displayed correspond to the color of the text of this button e Clear Genes clears the current genes displayed in the chromosom
29. be included in the gene set The button Unselect All unselects all genes in the set while the button Select All selects all genes A gene set can be loaded from a text file by pressing Load Gene Set and then specifying the name of the file containing the gene names One gene name should appear on each line of the file and there should be no header lines in a file Pressing Save Gene Set exports the current selected genes to a text file Pressing the button Query Set reorders the model profiles based on p value gene enrichments for genes in the query set As with GO categories the enrichment can be computed based on either the actual size or expected size of the profile depending upon which is selected in the Order Profiles By window The set of genes can also be selected to be the set of genes assigned to a profile in comparison data set as is explained in Section 7 22 lio gi a EAT MuwYWyMoBNVV BALL aid MOMO A A N A A a DER JTE PER MASA ASA AA AA AAA OI A MAS AA G a IN IAA S Figure 17 Top left Profiles are ordered by ID Top right Profiles are ordered based on significance Bottom left Profiles are ordered based on number of genes assigned Bottom right Profiles are ordered based on the expected number of genes assigned 23 Define Gene Set Sel Gene Name In Gene Set BCAG BCA BCA9 BCB10 ABCC13 BCC2 BCC3 BCC5 BCC6 BCD3 BCD4 BCE1 BCF1 R K il _ __Gene Name _ AB
30. be based on parameter is set to Maximum Minimum or Difference from 0 see below e Change should be based on The Change should be based on parameter defines how change is defined in the context of gene filter If Maximum Minimum option is selected a gene will be filtered if the maximum absolute difference between the values of any two time points not necessarily consecutive after transfor mation is less than the value of the Minimum Absolute Expression Change parameter If Difference from 0 is selected a gene will be filtered if the absolute expression change from time point 0 at all time points is less than the value of the Minimum Absolute Expression Change parameter Formally suppose 0 v1 v2 Un is the expression level of a gene after transformation and let C be the value of the Minimum Absolute Expression Change If the Maximum Minimum option is selected a gene will be filtered if max 0 v1 v2 Un min 0 v1 v2 Un lt C If the Minimum Absolute Expression Change option is selected the gene will be filtered if max 0 v1 val Un lt C Only the Mazimum Minimum option guarantees that the same set of genes would be filtered for any permutation of the time points For the Difference from 0 this is in general not true in this case the permutation test is based on the set of genes passing filter under the original order of time points e Pre filtered Gene File This file is optional
31. by expected number this will be the expected number of genes of this profile based on a permutation test f ordering based on a GO category or query set enrichment then this will be the number of genes in the selected GO category or set and assigned to this profile then a semicolon and then the uncorrected p value for the enrichment Figure 12 The legend that appears after pressing the help icon 4 1 Main Gene Table Pressing the Main Gene Table button displays a table which has a row corresponding to every gene that was not filtered and thus assigned to a model profile The table includes the gene s expression values after transformation and the profile s to which the gene was assigned An example of such a table is shown in Figure 13 Clicking on 17 E Table of Genes Passing Filter DEAR selected Gene Symbol SPOT Profile 0 5h 12 a E NA O AR a m e TRE A S e a if HE 3 Joss fot jose ht A O O E ECTS ICC joss jozs 082z ea es 00 LI E DABS 85 CO TI TC fosi IA o preso fff 258 O TEE a ICI UC jors jore pis O pw oa oe ha E AA B A pw pe pa per qse 689 2 ft ooo o s Jose E ios Copy Table Save Table B Copy Gene Names Save Gene Names Figure 13 An example of a table that appears after pressing Main Gene Table The table includes all genes that were not filtered and thus assigned to a model profile a row of the table opens a new window containing detailed information about the profile to which the gene of
32. describes the model profile overview interface which allows a user to visualize on a zoomable interface a large number of model profiles and order them based on their relevance to a GO category or user defined gene set Section 5 describes the interface for obtaining detail information about a model profile or cluster of profiles including a table of genes assigned and a table of GO category enrichments Section 6 describes the chromosome viewer integrated into STEM Section 7 describes STEM features to compare two data sets from different experimental conditions STEM also provides an implementation of the standard K means clustering algorithms which is described in Section 8 Sections are presented assuming a user is interested in the novel STEM clustering method Using K means in STEM is similar and the differences are discussed in Section Most but not all of the information contained in this manual can also be obtained by clicking on the help icons throughout the software 1 3 Citing STEM To cite the STEM software please reference the paper Ernst J and Bar Joseph Z STEM a tool for the analysis of short time series gene expression data BMC Bioinformatics 7 191 2006 To specifically cite the STEM clustering method please reference the paper Ernst J Nau G J and Bar Joseph Z Clustering Short Time Series Gene Expression Data Bioinformatics 21 Suppl 1 pp 1159 1168 2005 2 Preliminari
33. e 43 0 1 3 3 2 5 Mev Change 150 0 Genes Assigned 28 5 Genes Expected p value 4 0E 59 significant 5 O Change 150 0 Genes Assigned 28 5 Genes Expected p value 0 00 significant Profile 9 Profile 9 0 1 2 3 4 3 Expression Change 130 0 Genes Assigned 34 7 Genes Expected p value 0 00 significant v v 0 Figure 25 Example of detailed model profile information windows The top two images are of the same profiles but the left image is with the x axis scaled to be based on real time and the y axis to be uniform The window plots a graph of all genes assigned to the profile The text at top gives information about the profile including the number of genes assigned the number of genes expected and the p value significance The Profile Gene Table button displays a table of genes assigned to the profile while the Cluster Gene Table displays a table of genes assigned to any profile in the profile s cluster of profiles The Profile GO Table displays a gene category enrichment for genes assigned to the profile while Cluster GO Table displays a gene category enrichment for genes assigned to any profile in the profile s cluster of profiles 32 second ratio the numerator C contains the total number of genes in the category or user defined gene set The denominator in the second ratio D is the total number of genes on the array The number to the right of the second ratio after the
34. e Enrichment Analysis Table From the window with details about a model profile a user has the option to display a table that includes gene enrichment for Gene Ontology GO categories along with any other categories that may appear in an annotation 36 Category ID Category Name Genes Category Genes Assigned Genes Expected Genes Enriched p value Corrected p value 60 0007049 cell cycle 427 20 0 17 3 3 2E 12 lt 0 001 DNA metabolism regulation of progression through cell cycle regulation of cell cycle DNA replication 1106 nucleobase nucleoside nucleotide and nuc 1456 DNA dependent DNA replication 148 nucleus 11631 DNA repair 1135 cellular physiological process 4298 biopolymer metabolism 1265 primary metabolism 3064 response to DNA damage stimulus 152 intracellular part 3244 response to endogenous stimulus 1162 intracellular 13426 lintracellular membrane bound organelle 2435 membrane bound organelle 2436 cellular metabolism 13121 nucleotidyltransferase activity 62 metabolism 3326 intracellular organelle 2795 phosphoinositide mediated signaling 45 cell part 14574 transferase activity 755 3 4E 10 4 7E 10 4 9E 10 1 4E 9 5 9E 9 2 1E 8 2 4E 8 2 0E 7 2 4E 7 3 5E 7 5 2E 7 5 6E 7 9 0E 7 9 5E 7 1 5E 6 2 3E 6 2 3E 6 2 3E 6 2 7E 6 4 6E 6 7 4E 6 9 6E 6 2 4E 5 2 4E 5 0 001 0 001 0 001 lt 0 001 lt 0 001 lt 0 001 0 001 lt 0 00
35. e annotation information and various execution options Alternatively the options can be loaded from a saved file through the Load Saved Settings button at the bottom of the interface Pressing the execute button causes the clustering and gene enrichment analysis algorithms to execute and then a new interface described in Section 4 to appear specify a file from which to load saved settings in which case any options already specified will be overwritten Pressing the Execute Button causes STEM to execute the selected clustering algorithm and then display the output interface described in Section 4 If the data file does not have two or more time points then results for a standard gene enrichment analysis will be displayed For details about using STEM for standard gene enrichment analysis on non time series data consult Appendix C 3 1 Expression Data Info The first field in the expression data section of the interface is the Data File field where a user specifies the input data file An input data file consists of gene symbols time series expression values and optionally spot IDs Spot IDs uniquely identify an entry in the data file and if they are not included in the data file then they will be automatically generated While spot IDs must be unique the same gene symbol may appear multiple times in Ca E o 1 1 tE Ti e Gene Symbol 0h O Sh 3h Gh 12h 2 ZFX 0 027 0 1588 0169 0193 0 165 3 ZNF133 0 183 0 068 0 134 0
36. e command line Below is a sample file The parameters names are on the left side and a tab separates them from their value Lines which begin with a are comments and are ignored A defaults file can also be generated through the disk icon on the Main Profile Interface and loaded into STEM through the Input Interface Main Input Data_File g27_1 txt Gene_Annotation_Source Human EBI Gene_Annotation_File gene_association goa_human gz Cross_Reference_Source Human EBI Cross_Reference_File human xrefs gz Gene_Location_Source User provided Gene_Location_File Clustering_Method STEM Clustering Method K means STEM Clustering Method Maximum_Number_of_Model_Profiles 50 Maximum_Unit_Change_in_Model_Profiles_between_Time_Points 2 Normalize_Data Log normalize data Normalize data No normalization add 0 Normalize data Spot_IDs_included_in_the_data_file true Repeat data Repeat_Data_Files comma delimited list g27_2 txt Repeat_Data_is_from Different time periods The same time period Different time periods Comparison Data Comparison_Data_File vaca txt Comparison_Repeat_Data_Files comma delimited list Comparison_Repeat_Data_is_from Different time periods The same time period Different time periods Comparison_Minimum_Number_of_genes_in_intersection 5 Comparison_Maximum_Uncorrected_Intersection_pvalue 0 0050 Filtering Maximum_Number_of_Missing_ Values 0 Minimum_Correlation_between_Repeats 0 0 Minimum_Absolute_Log_Ratio_Expre
37. e gene Note that the cross references is only used to map between gene symbols and not spot IDs and gene symbols The Cross Reference Source field gives the option to select either User Provided No cross references or cross references for Arabidopsis Chicken Cow Human Mouse Rat or Zebrafish provided by the European Bioinformatics Institute EBI If User Provided is selected for the cross reference file field then the Cross Reference File field becomes editable and a user can specify a cross reference file Any gene symbols listed on the same line in the cross reference file will be considered equivalent The symbols on a line can be delimited by either a tab semicolon comma or a pipe As with gene annotations files a cross reference file can either be in an ASCII text file or GNU zip version of an ASCII text file Below the Cross Reference Source field is the Gene Location Source field Gene locations are used for displaying genes on the chromosome viewer The Gene Location Source field can be set to either User Provided No Gene Locations or an Ensembl Biomart species which STEM directly supports see Appendix E for a list of supported species If User Provided is selected then a location file can be specified in the Gene Location File field The format of a gene location file is GFF version 2 which is specified at http www sanger ac uk Software formats GFF GFF_Spec sht
38. e is a lower variance in a gene s time point O expression value than at other time points One could thus expect for these time series experiments that profiles centered around O with high variance will be more likely to be considered significant in a permutation test that permutes this low variance time point 0 A permutation test that does not permute time point 0 can be useful here since profiles found to be significant under this test are significant independent of the time point 0 expression value being known more accurately than that of the other time points In practice the set profiles found to be significant by either test will usually be similar e Correction Method The significance level can be corrected for the fact that multiple profiles are being tested for significance The correction can be a Bonferroni correction where the significance level is divided by the number of model profiles or the less conservative False Discovery Rate control 2 If none is selected then no correction is made for the multiple significance tests Note that this parameter for multiple test correction for model profiles is unrelated to the corrected p values in a GO enrichment analysis 12 3 3 3 Clustering Profiles Options Advanced Options Filtering Model Profiles Clustering Profiles Gene Annotations GO Analysis Minimum Correlation 0 7 Minimum Correlation Percentile repeat only ia Figure 8 The above panel i
39. e profile of the outlier will look flat and if Global is selected all other genes will look flat Also note that the model profiles will generally be on different scales than the genes Additional options in this section are e Scale should be based on only selected genes If the gene display policy is to Display only selected and Profile specific or Global is selected then there is the further option to re adjust the y scale based on only the currently visible genes by selecting Scale should be based on only selected genes e Show Main Y axis gene tick marks If the genes are displayed in a Profile specific or Global manner and the Show Main Y axis gene tick marks is selected then tick marks corresponding to the gene expression values are visible e Main Y axis gene tick interval the interval at which the tick marks are displayed The longer tick mark in the center of the box is the O tick mark Zi The options in the third section determine the y axis scale on the profile detail windows which appear when clicking on a profile box on the main interface If Determined automatically is selected then STEM automatically determines the y scale based on the expression level of the genes assigned to the profile The y scale may be different for each profile window If the option Fixed with parameters below is selected then the y scale on the profile windows will have a minimum and maximum determined by the values of the Min and Maz par
40. e set of genes being analyzed that belong to the category must be greater than or equal to the value of the Minimum number of genes parameter on the GO Analysis panel under Advanced Options For official GO categories the level of the category must be greater than or equal to the value of the Minimum GO level parameter also on the GO Analysis panel under Advanced Options As discussed in Section 4 3 there are two ways to compute gene enrichment one based on the actual size of the set and the other based on the expected size of the set For clusters of profiles gene enrichment is always based on the actual size of the set For profiles gene enrichment by default is based on the actual size of the set However there will be a button along the bottom of the window which says Click for GO Result s Based on the Profile s Expected Size that when pressed will open a new table where the enrichment analysis is based on the profile s expected size The columns of a gene enrichment table are as follows e Category ID The ID for the category e Category Name The name for the category e Genes Category The number of genes on the entire microarray that were annotated as belonging to the category e Genes Assigned The number of genes annotated as belonging to the category that are part of the set of genes being analyzed 37 Genes Expected The number of genes annotated as belonging to the category tha
41. e viewer e Sort by ID Sort by Size If Sort by ID button is visible then the chromosomes are currently sorted in decreasing order of size and pressing the button sorts them by increasing ID see Figure 32 If Sort by Size button is visible then the chromosomes are currently sorted in increasing ID and pressing the button sorts them by decreasing size see Figure 33 e Chr Enrichment displays a table of chromosome enrichments for genes currently displayed see Figure 34 e Gene Table displays a table of the genes currently displayed in the chromosome viewer and their location see Figure 35 e Unmatched Genes displays genes that were attempted to be displayed but could not be matched to any chromosome location based on the chromosome location input data 39 El Homo sapiens genes 1 Figure 32 The above window shows an example display of the chromosome viewer when the chromsomes are sorted by size El Homo sapiens genes 1 Figure 33 The above window shows an example display of the chromosome viewer when the chromosomes are sorted by ID 40 Chromosome Enrichment Chromosome Genes Base Set Genes Assigned Genes Expected Genes Enriched p value Corrected p value 27 48 10 3 3 8 3 8 6 0 3 0 6 1 3 3 5 7 1 9 3 1 6 4 4 7 3 6 6 0 1 6 1 6 6 7 4 0 3 8 3 9 6 1 0 0 0 0 Fa
42. eans clusters Figure 47 shows the comparison legend for the comparison interface with K means analogous to Figure 38 for comparison with STEM profiles AQ All K Means Clusters 1 All K Means Clusters 1 OB Clusters ordered based on cluster ID Clusters ordered based on cluster ID Figure 42 Above is the main output interface which is similar to the interface described in Section 4 Each box corresponds to a K means cluster and displays the average expression of genes in the cluster Left No individual gene expression profiles are displayed Right The individual gene expression profiles are displayed on a Global scale When genes are displayed the cluster means are on the same scale as the genes Average log of expression change ratio over time of genes in the cluster Lower left hand corner contains information relevant to the current reordering of clusters f ordering by number of genes this will be the number of genes assigned to the cluster f ordering based on a GO category or query set enrichment then this will be the number of genes in the selected GO category or set and assigned to this cluster then a semicolon and then the uncorrected p value for the enrichment Figure 43 Legend for a K means cluster box 50 Interface Options Display policy on main interface O Do not display genes O Display only selected genes 0 Display all genes Change Color of Genes Display Cluster Me
43. ed on the actual size based ISMN Figure 24 The above image shows a screen shot of the profile overview interface zoomed in on the four profiles most enriched for cell cycle As Figure illustrates the model profile overview interface is zoomable To zoom in hold down the right mouse button and move the mouse to the right To zoom out hold down the right mouse button and move the mouse to the left To pan hold down the left mouse button while not over a model profile box and then move the mouse in the desired direction The ability to zoom in and out is powered by the the Piccolo Toolkit 1 which is distributed under a BSD license 5 Model Profile Details Interface When a user clicks on a model profile box on the model profile overview screen of the software on a model profile box on the comparison interface screen discussed in Section 7 or on a row in the main gene table a window with detailed information about the profile and the genes assigned to the profile appears in a new window Figure 25 The window displays a graph of the expression values after transformation of all genes assigned to the profile Note that whether the time points on the x axis are uniformly spaced or based on real time is determined by the x axis scale setting under the Inteface Options windows as discussed in Section 4 5 Along the top center of the window are two lines of text The first line contains the model profile ID and a vector representing the expr
44. ed to the profile The expected size is the number of genes expected to be assigned to the profile as computed based on a permutation test During a permutation test the order of the time point values before transformation Log normalize data Normalize data or No normalization add 0 are randomly permuted the transformation is applied and then genes are assigned to model profiles This is done for a large number of permutations and the expected number of genes assigned to a profile is the average number of genes assigned over all permutations The actual size based p value gene enrichment is computed based on a hypergeometric distribution Suppose there are a total of N genes on the microarray m of the these genes are in the category of interest v of the genes belong to the category of interest and were also assigned to the profile of the interest and the number of gene s assigned to the profile is s then the p value of seeing v or more genes belonging to both the category of interest and assigned to the profile of interest can be computed as min m s N oe 3 If the enrichment is computed based on a profile s expected size Se then the p value of seeing more than v genes belonging to both the category and profile of interest can be computed based on a binomial distribution 20 with parameters m and as AMICS JN If a profile has more genes assigned than expected then it is possib
45. ell cycle genes AMARME Miss costo MAL eS oa EE CN al Fittered Gene List Main Gene Table Interface Options Order Profies By Order Custers By Compare E Fittered Gene List Main Gene Table Interface Options Order Profiles By Order Custers By Compare E YY Figure 22 Top left Main interface displaying all individual gene expression profiles with the y axis scale set to Gene specific Top right Main interface displaying all individual gene expression profiles with the y axis scale set to Profile specific Bottom left Main interface displaying all individual gene expression profiles with the y axis scale set to Global Bottom right Main interface with a gene display policy of Display only Selected and when ordering by the GO category cell cycle The only gene expression profiles displayed correspond to GO cell cycle genes 29 EZ All STEM Profiles 1 Clusters ordered based on number of genes and profiles ordered by significance default Sam OMS NN LO SAL IN ES MIAWMMSMIMAN Maw MV Ro LAN WWAN Fitered Gene List Main Gene Table _ Interface Options Order Profiles By Order Custers By Compare 2 Figure 23 The main interface as in Figure 11 except the x axis time points are display proportional to the real sampling rate instead of uniformly 30 4 6 Zooming and Panning AILSTEM Profiles 1 Profiles ordered bas
46. ere is the option to change the color of the genes on the main interface by pressing the Change Color of Genes button The color of the text of this button will be the same color as the genes Additional options in this first section are e Display Model Profile 1f this option is selected the model profile pattern is displayed e Display ID if this option is selected the ID of the model profile in the upper left hand corner is displayed e Display details when ordering if this option is selected details about the profile cluster in the context of the ordering of the profiles is displayed in the lower left and or upper right corners The options in the second section determines the y axis scale of the individual gene expression profiles displayed on the main interface e If Gene specific is selected then each individual gene is scaled separately to be closely aligned with the model profile This is valid since the correlation coefficient is used to measure distance and is unaffected by scaling see Figure 22 top left e If Profile specific is selected then the y scale of all genes in a profile box are on the same scale but the y scale in different profile boxes will be different see Figure 22 top right e If Global is selected then all genes are plotted on the same y scale on the main interface See Figure bottom left Note that if there is one outlier gene and Profile specific is selected then the other genes in th
47. es e To use STEM a version of Java 1 4 or later must be installed If Java 1 4 or later is not currently installed then it can be downloaded from http www java com e To install STEM simply save the file stem zip locally and then unzip it This will create a directory called stem e To execute STEM in Windows with its default initialization options simply double click on the file stem cmd in the stem directory e To execute STEM from a command line change to the stem directory type and then type java mx1024M jar stem jar e STEM can be started with its initial settings specified in a default settings file The format of a default setting file is specified in Appendix A To have STEM load its initial settings from a default settings file from the command line append d followed by the name of the default settings file to the above command For instance to have STEM start with the settings specified in the file defaults txt use the command java mx1024M jar stem jar d defaults txt e STEM can also be run in a batch mode from the command line this is discussed in Appendix B 3 Input Interface The first window that appears after STEM is launched is the input interface Figure 1 The interface is divided into four sections In the top section a user specifies the expression data files and normalization options for the data In the second section a user specifies the gene annotation information In the third section a user speci
48. escribed in Section 4 with a few differences of note For K means clustering each box on the interface corresponds to a cluster instead of a profile The time series shown in the box is the average expression of all genes assigned to the cluster The number in the top left hand corner of the box is a Cluster ID see Figure for a legend All K means cluster boxes appear white since no statistical significance is associate with them The K means cluster are by default ordered based on ID IDs are assigned based on the cluster average expression value at the first time point K means cluster boxes can be reorder on the main interface analogous to the reordering of STEM profile boxes described in Section 4 3 Pressing the Interface Options button displays a window shown in Figure similar to what was described in Section except in the options for y axis scale for genes on the main interface Profile specific has been replaced with the analogous Cluster specific option and there is no longer the Gene specific option When genes are displayed on the main profile the cluster means are plotted on the same scale as the genes Pressing the Order Cluster By button brings up the dialog box in Figure 45 through which the clusters can be reordered The reordering criteria of the clusters can be the number of genes assigned to the cluster or p value enrichment for a GO category or user defined gene set Pressing a cluster box opens a window such
49. ession pattern The second line of the window contains a count of the number of genes assigned to this model expression profile a count of the expected number of genes assigned to the model profile based on a permutation test the uncorrected p value for the significance for the number of genes assigned being greater than the number expected and whether or not this is statistically significant as defined by the parameters on the Model Profiles panel under Advanced Options If the profiles are reordered by gene enrichment for genes belonging to a GO category or a user defined gene set then an additional line of text will appear below the first two lines of text Figures 26 27 The additional line indicates the uncorrected p value of the profile gene enrichment for the category or gene set the profiles are being ordered by In parentheses are two ratios with a vs in between thus having the form 4 VS In first ratio the numerator A is the number of genes assigned to the profile that belong to the category or user defined gene set by which the profiles are ordered The second number B is either the total number of genes assigned to the profile if the profiles are ordered based on actual size gene enrichment or the expected number of genes assigned to the profile if the profiles are ordered based on expected size gene enrichment In the l Profile 43 SS PTE Profile 43 0 1 3 3 2 Profil
50. example of a window that provides detailed information about a K means cluster K means Cluster IDS Correlation between clusters 6 and 2 average expression values genes assigned to cluster genes of the 87 assigned to cluster 6 in the first 6 in the first experiment experiment that were also assigned to cluster 2 in the second experiment p value for the of genes inthe intersection Figure 47 Legend for the comparison interface with K means clusters 92 References fl Bederson B B Grosjean J and Meyer J Toolkit Design for Interactive Structured Graphics IEEE Transactions on Software Engineering 30 8 pp 535 546 2004 Benjamini Y and Hochberg Y Controlling the False Discovery Rate A Practical and Powerful Approach to Multiple Testing J Roy Stat Soc B MET 57 1 289 300 1995 Ernst J Nau G Bar Joseph Z Clustering Short Time series Gene Expression Data Bioinformatics Pro ceedings of ISMB 2005 21 Suppl 1 pp 1159 1168 2005 Gene Ontology tool for the unification of biology The Gene Ontology Consortium Nature Genet 25 25 29 2000 Guillemin K Salma N R Tompkins L S and Falkow S Cag pathogenicity island specific responses of gastric epithelial cells to Helicobacter pylori infection PNAS 99 15136 15141 2002 59 A Defaults File Format As mentioned in the preliminary section the default settings for STEM can be specified in a file and used through the d on th
51. f genes with a single time point column STEM will perform a Gene Ontology enrichment analysis for those genes whose absolute value exceeds the value specified by the Minimum Absolute Expression Change parameter In this case the base set of genes is all genes in the data file STEM can also be used to do an enrichment analysis for an arbitrary set of genes and an arbitrary base set of genes The set of genes to do an enrichment analysis on is specified in the Data File while the base set of genes are specified in the Pre filtered Gene File The first line of these files is a header line and every line below the header line will contain one gene per line As with a data file the field Spot IDs included in the data file should be unchecked unless spot IDs are the first column and gene symbols are the second column in which case the field should be checked After pressing execute a gene enrichment analysis table will appear as described in Section 5 2 Note that if a Gene Location File is specified a window displaying the genes along a chromosome will also be displayed as specified in Section 6 In version 1 3 7 the ability to perform a batch gene ontology of multiple gene sets and have the results written out to a file was added The output file is specified with o GOoutfile txt from the command line All the gene sets should be in one file with a header line to denote breaks between gene sets ov D Gene Annotati
52. fied Below is a more detailed description of the parameters on the filtering panel e Maximum Number of Missing Values A gene will be filtered if the number of missing values exceeds this parameter Advanced Options Filtering Model Profiles Clustering Profiles Gene Annotations GO Analysis Maximum Number of Missing Values IS Minimum Correlation between Repeats ja Minimum Absolute Expression Change pa Change should be based on C Maximum Minimum Difference from 0 Pre filtered Gene File Figure 6 The above panel is used to specify gene filtering options e Minimum Correlation between Repeats This parameter controls filtering of genes which do not display a consistent temporal profile across repeat experiments and only applies if there is repeat data selected to be from Different tume periods If there is a single repeat file a gene will be filtered if its correlation between the original data set and the repeat set is below this parameter If multiple repeats are available then the gene will be filtered if the mean of all its pairwise correlations between experiments is below this parameter e Minimum Absolute Expression Change After transformation Log normalize data Normalize data or No Normalization add 0 if the absolute value of the gene s largest change is below this threshold then the gene will be filtered How change is defined depends on whether the Change should
53. fies the desired clustering algorithm and various execution options These three sections of the interface are described in more detail in the next three subsections In the fourth section of the interface there are two buttons a Load Saved Settings button and an Execute button Pressing the Load Saved Settings button allows the user to E STEM Short Time series Expression Miner 1 Expression Data Info Data File 9271 bd Browse View Data File O Log normalize data Normalize data D No normalization add 0 Spot IDs included in the data file Gene Info Gene Annotation Source Human EB Cross Reference Source Human EB Gene Location Source User provided Gene Annotation File gene association goa_human gz Browse Cross Reference File human refs gz Browse Gene Location File 3 Browse Download Annotations Cross References locations 7 Ontology E 3 Options Clustering Method STEM Clustering Method Maximum Number of Model Profiles 50 4 y Maximum Unit Change in Model Profiles between Time Points EA Ao Advanced Options 4 Execute I Load Saved Settings 2004 Carnegie Mellon University All Rights Reserved pe xA A A X23 Figure 1 Above is the main input interface which is the first screen that appears when STEM is launched From this screen a user specifies the input data gen
54. file that is part of its cluster of profiles If the profiles or cluster of profiles are reordered based on a category then two additional buttons will appear above the bottom row Pressing the top of these two button will display a table of the genes that were assigned to the profile and also belong to the category by which the profiles are ordered In Figure 26 this is the Profile cell cycle Gene Table button Below this button is a button which gives the option to plot only the profile genes belonging to the category by which the profiles are ordered This is the Click to plot only profile cell cycle genes button on the left side of Figure 26 Once this button is pressed the button will be replaced with a button that says Click to plot all profile genes right side of Figure 26 which gives the user the option to revert back to having all the profile genes plotted If the profiles or cluster of profiles are ordered based on a user defined gene set referred to as a query gene set then there will be several additional buttons Figure 27 The button Click to plot only profile query set genes replots the window with only profile genes that also belong to the user defined gene set Pressing the button will cause the button to be replaced with a Click to plot all profile genes button which pressing will revert to the original window Above the Profile Gene Table and Profile GO Table are two buttons the Profile Query Gene Table and the Pr
55. hen 16 a statistically significant number of genes were assigned to the model expression profile Model profiles with the same color belong to the same cluster of profiles Clicking on a model profile opens a new window that provides more detailed information about the model profile and also the option to display gene tables and GO enrichment analysis tables The window that appears with details about a model profile is discussed in depth in Section 5 Along the bottom of the screen are several buttons Filtered Gene List Main Gene Table Interface Options Order Profiles By Order Clusters By Compare a disk icon and a help icon The Filtered Gene List button displays a table of genes that were filtered and thus not assigned to a model profile The Main Gene Table button displays a table of genes that were not filtered and thus assigned to a model profile The Interface Options button displays a window in which one can adjust various interface options The Order Profiles By button opens a dialog window that allows one to reorder the model profiles on the main overview screen by a number of criteria The Order Clusters By button opens a dialog window that allows one to reorder the clusters of profiles that is profiles are reordered with the constraint that profiles of the same color must be kept together The main gene table the filtered gene list ordering profiles ordering clusters of profiles and interface options are explai
56. hich its time series most closely matches based on the correlation coefficient The number of genes assigned to each model profile is then computed The number of genes expected to be assigned to a profile is estimated by randomly permuting the original time point values renormalizing the gene s expression values then assigning genes to their most closely matching model profiles and repeating for a large number of permutations The average number of genes assigned to a model profile over all permutations is used as the estimate of the expected number of genes assigned to the profile The statistical significance of the number of genes assigned to each profile versus the number expected is also then computed Statistically significant model profiles which are similar to each other can be grouped together to form clusters of profiles The biological significance of the set of genes assigned to the same profile or the same cluster of profiles can then be assessed using a GO enrichment analysis For a more detailed discussion of the novel method STEM uses to cluster genes and associate statistical significance with genes having the same expression profile see 3 1 2 Manual Overview The remainder of the main portion of the manual contains six sections Section 2 contains instructions on installing and starting STEM Section 3 discusses the input to STEM including execution options and data file formats Section 4
57. hich any two column of values for the same time point belong could be interchanged without effect while if the repeat data is selected to be from Different time periods this is not the case If the repeat data is from Different time periods the repeat data will be averaged after normalization while if the repeat data is from The Same Time Period the repeat data will be averaged before normalization In the case the repeat data is from Different time periods the repeat data can be used to filter genes with inconsistent expression patterns and also to provide noise estimates by which to base clustering model profiles as explained in Section 3 3 Repeat Data Files Repeat Data File s g27 2 bd Repeat Data is from 2 Different time periods gt The same time period A Add File View Selected File Figure 4 The above window is used to specify repeat data files A user can add or remove repeat files with the Add File and Remove File buttons A user also needs to specify whether the repeat data samples are from the same time period or different time periods as the original data The contents of a repeat file can be viewed by selecting the repeat file and then pressing the View Selected File button G0 0016491 60 0000004 60 0006372 PRPFO G0 0008 248 60 0006397 50 0005634 50 000568 2 PRPF4 G0 0008248 G0 0000398 G0 0008380 G0 0005681 JMJD2B GO 00036 7 G0 0006355 JMJD2A GO 0003677 G0 0006355 AOX1
58. information about the genes in the intersection between the profile clicked on and the profile to the immediate left of the yellow bar left side Figure 39 Near the top of the window is a line of text indicating how many genes were in the intersection and the p value of the intersection The intersection profile window also contains a button which plots only those genes in the profile which were also assigned to the profile in its row to the left of the yellow bar in the other experiment After pressing the Click to plot only genes in intersection 44 one has the option to press the button Click to plot all profile genes to revert back to the original screen Two additional buttons that appear on the profile interface are the Profile Intersect Gene Table button and the Profile Intersect GO Table buttons The Profile Intersect Gene Table button displays a gene table Section 5 1 of genes assigned to this profile which were also assigned to the profile to the left of the yellow bar in the other experiment that is the genes in the intersection The Profile Intersect GO Table buttons displays a table Section 5 2 with a gene enrichment analysis for genes in the intersection set Clicking on a profile to the left of the yellow bar opens a window which displays information about the profile but does not provide any information about gene intersections Correlation between profile 38 and 13 genes assigned to profile genes of the
59. ing their statistical significance The panel used to adjust parameters related to model profiles appears in Figure The parameters on this panel are only relevant to the STEM clustering method and not K means clustering The first two parameters Maximum Correlation and Maximum Number of Candidate Model Profiles influence the selection of model profiles along with the two parameters from the main input interface Maximum Number of Model Profiles and Maximum Unit Change in Model Profiles between Time Points The final three parameters Number of Permutations per Gene Significance Level and Correction Method are related to the statistical test of whether a profile has a statistically significant number of genes assigned The parameters on this panel are described below e Maximum Correlation This parameter specifies the value that the maximum correlation between any two model profiles must be below and thus can be used to guarantee that two very similar profiles will not be selected Lowering this parameter could have the effect that the number of model profiles selected is less than the Maximum Number of Model Profiles even if more candidate model profiles are available This parameter s maximum value is 1 thus preventing two perfectly correlated model profiles from being selected e Maximum Number of Candidate Model Profiles Candidate model profiles are non constant profiles which start at 0 and increase or decrease an
60. integral number of units that is less than or equal to the value of the Maximum Unit Change in Model Profiles between Time Points If the number of candidate model profiles exceeds this parameter then instead of explicitly generating all candidate model profiles a subset of candidate model profiles of this size will be randomly selected In most cases there will be no need to adjust this parameter 11 e Number of Permutations per Gene This parameter specifies the number of permutations of time points that should be randomly selected for each gene when computing the expected number of genes assigned to each of the model profiles If this parameter is 0 then all permutations are used Increasing the number of permutations will lead to slightly greater accuracy at the expense of greater execution time e Significance Level The significance level at which the number of genes assigned to a model profile as compared to the expected number of genes assigned should be considered significant If the Correction Method parameter for multiple hypothesis testing is Bonferroni then this parameter is the significance level before applying a Bonferroni correction If Correction Method is False Discovery Rate then this parameter is the false discovery rate If Correction Method is none then this parameter is the uncorrected significance level e Permutation Test Should Permute Time Point 0 If the box Permutation Test Should Pe
61. ion File box will be downloaded from unless it is an EBI data source in which case it will be downloaded from If the Cross References box is checked then the file listed in the Cross Reference File box will be downloaded from tp ftp ebi ac uk pub databases G0 goa If the Location box is checked then a file listed in Gene Location File will be generated corresponding to the species specified in Gene Location Source based on data downloaded from http www biomart org biomart martservice If the Ontology field is checked then the file gene_ontology obo will be downloaded from http www geneontology If the annotation cross reference or ontology file is required for use and not present in the stem directory then the corresponding field will be checked and there will not be an option to uncheck the field forcing download of the file s If the Gene Annotation Source is set to User Provided then there will not be an option to download the gene annotation file and likewise for the cross reference source field and cross reference file Upon pressing the execute button the files corresponding to the checked fields will be downloaded 3 3 Options In the third section of the interface a user has the option to specify a variety of execution options for STEM The first option a user specifies is the Clustering Method which can be set to either STEM Clustering Method or K means The STEM clustering method is the novel clusteri
62. l significance of the gene set intersec tion In any row the profiles to the right of the yellow bar are ordered with increasing p value for the gene set intersection with the profile to the left of the yellow bar The profiles to the left of the yellow bar are ordered to have increasing minimum intersection p value significance with a profile in its row to the right of the yellow bar e Order By Correlation This reorders profile pairs based on correlation In any row the profiles to the right of the yellow bar are ordered based on increasing correlation with the profile to the left of the yellow bar 45 The profiles to the left of the yellow bar are ordered to have increasing minimum correlation with a profile in its row to the right of the yellow bar Profile 41 a Profile 41 AE Profile 41 0 1 2 3 4 Profile 41 0 1 2 3 4 4 Expression Change Expression Change v i v 0 86 0 Genes Assigned 31 8 Genes Expected p value 0 00 significant 4 v i v 0 86 0 Genes Assigned 31 8 Genes Expected p value 0 00 significant 8 0 of the 97 0 genes assigned to Profile 33 in the comparison experiment were also assigned to this profile p value 7e 8 0 of the 97 0 genes assigned to Profile 33 in the comparison experiment were also assigned to this profile p value 7e 9 Figure 39 On the left is an example of a model profile window that appears when a model profile box to the right of a yellow bar is pres
63. le a gene enrichment for a category will be significant under an expected size based enrichment while it is not significant under an actual size enrichment Likewise if a profile has fewer genes assigned than expected it is possible a gene enrichment for a category will be significant under an actual size based enrichment while it is not significant under an expected size based enrichment If multiple independent processes happen to have the same temporal profile then a significant gene enrichment for the process may be missed through an actual size enrichment but detected through an expected size enrichment Clicking on a row of the table will reorder the profiles based on the p value enrichment for the category of that row Whether the p value enrichment is computed based on the profile s actual size or expected size will depend on which is selected next to the label Order using enrichment p values based on a profile s Profiles are ordered row wise from left to right and top to bottom based on the significance of the enrichment for the selected category The profile most enriched for the selected category appears in the top left corner The next most enriched profile appears second in the top row and so on For instance Figure 16 shows an example of the model profiles reordered based on an actual size enrichment for cell cycle genes The numbers that appear in the bottom left hand corner of the model profile box are the number of ge
64. left Figure 17 When profiles are reordered based on the number of genes assigned to the profile the number of genes assigned to a profile appears in the bottom left hand corner of its profile box 21 e Expected Number Reorders profiles based on the expected number of genes assigned to the profile The expected number is computed based on a permutation test of the time points bottom right Figure 17 The profiles with the greatest expected number genes assigned appear to the left on the top row When profiles are reordered based on the expected number of genes assigned to the profile the expected number of genes assigned to a profile appear in the bottom left hand corner of its profile box e Default Order Reorders the profile back to their original order In the original default ordering all significant profiles appear before non significant profiles Profiles belonging to the same cluster are grouped together Clusters are ordered based on the total number of genes assigned to any profile in the cluster Within clusters of profiles and among non significant profile the profiles are ordered based on increasing p value for the significance of the number of genes assigned versus what was expected e Define Gene Set Pressing the Define Gene Set brings up a dialog box Figure 18 which allows one to reorder profiles by enrichment for genes in a user defined gene set Any gene which is checked will
65. ls_windows_should_be Determined automatically Fixed Determined automatically Y_Scale_Min 3 0 Y_Scale_Max 3 0 Tick_interval 1 0 X axis_scale_should_be Uniform Based on real time Based on real time 313 B Running STEM in Batch Mode STEM can be run in batch mode through a command line In batch mode STEM takes as input either a single settings file or a directory of setting files STEM then runs the method on the settings file s and writes two output files for each settings file a gene table file and a profile table file The gene table file has _genetable appended to the name of the settings file and contains the table of genes the spot s which the gene was from the profile to which the gene and its expression values after transformation The profile table which has the _profiletable appended contains each profile ID its corresponding expression pattern the number of genes assigned the number expected and the p value significance of the number of genes assigned STEM can be run in batch mode from a command line with the command java mx1024M jar stem jar b batchInput batchOutputDir batchInput is either directory containing only settings files or a setting file batchOutputDir is a directory where the output files go 56 C Using STEM for Standard Gene Ontology Enrichment Analysis STEM may be used for standard Gene Ontology enrichment analysis for non time series data in two ways Given a data file o
66. me points A model profile between two consecutive time points can either stay constant or increase or decrease an integral number of units up to this parameter value If a user selects K means clustering then these two options do not appear and instead two options specific to K means clustering appear see Section 8 The remaining options can be accessed by pressing the Advanced Options button These remaining options are divided into five panels Filtering Figure 6 Model Profiles Figure 7 Clustering Profiles Figure 8 Gene Annotations Figure 9 and GO Analysis Figure LO which are discussed in the next subsections 3 3 1 Filtering Options Through the parameters on the Filtering panel shown in Figure 6 a user can adjust the criteria STEM uses to filter genes If a gene is filtered then it will be excluded from further analysis Genes can be filtered if they do not show a sufficient response to experimental conditions Minimum Absolute Expression Change there are too many missing values Maximum Number of Missing Values or the gene expression pattern over repeats is too inconsistent Minimum Correlation between Repeats If the Log normalize data or Normalize data options are selected a gene will automatically be filtered if its expression value at the first time point is missing user can also filter genes by criteria not implemented in STEM in which case a Pre filtered Gene File should be speci
67. me to return the table to its original order To cycle through the sorting options in the opposite order hold down the Shift button when clicking To do a compound sort on multiple columns hold down the Ctrl button when clicking Also as with all tables in STEM a user can save the contents of the table by pressing the Save Table button or copy the contents to the clipboard with the Copy Table button As with any gene table in STEM a user can also just save the list of gene names using the Save Gene Names button or copy it to the clipboard with Copy Gene Names 18 4 2 Filtered Gene List Table of Genes Filtered selected Gene Symbol 0SPOT_1868 1858 DSPOT r56 OfSPOT 10752 0752 FTN 442018214 0 SPOT 2978 q76 SPE ae Tobe COROTA Tgr LETS 3248 0 SPOT_17087 7082 O SPOT_183908 0398 O ESPOT 22135 22135 B Copy Table Save Table B Copy Gene Names Save Gene Names a a dm Lc Figure 14 An example of a list of filtered genes If a user presses the button Filtered Gene List a table such as the table in Figure appears The table contains a list of genes that were filtered and thus not assigned to a model expression profile The parameters controlling filtering of genes are described in Section The three columns of this table the Selected column the gene symbols column and the spot ID column are the same columns as the first three columns of the main gene table described in section
68. ml If a file is in the GFF format it must have an extension the portion of the file name after the first period which starts with gff The files which are generated automatically based on data downloaded from Ensembl Biomart must have an extension which starts with mart the format of these files is undocumented here If an Ensembl Biomart supported species is selected the gene identifier of the species must match an official symbol of a identifier for the species listed in the biomart_species tat For almost all species at least the official Entrez gene ID is supported To keep the downloaded file size reasonable not all identifier systems that are supported by Biomart through MartService for a species are listed in this file Additional identifier systems can be added to the last column where each identifier system is delimited by a Supported identifiers for a species can be found through the Biomart site for Homo sapiens the url is biomart martservice type attributeskvirtualSchema defaultkdataset hsapiens_gene_ensemb1 and for other species replace hsapiens_gene_ensembl with the entry in the first column of the biomart_species txt file for the species of interest At the bottom of the gene annotation section of the interface is the line Download the latest and then four checkboxes Annotations Cross References Locations and Ontology If the Annotations box is checked then the file listed in the Gene Annotat
69. n comma or a pipe symbol and all will be annotated as belonging to the GO category in Column 5 Note that the exact content of the DB_Object_ID DB_Object_Symbol DB_Object_Name and DB_Object_Synonym varies between annotation source consult the README files available at http www geneontology org GO current annotations shtml to find out more information about the content of these fields for a specific annotation source 2 The alternative format for an annotation file is two columns delimited by a tab as illustrated in Figure The first column contains gene symbols or spot IDs and the second column contains category IDs The entries in each column are delimited by a semicolon comma or a pipe symbol If the same gene symbol or spot ID appears on multiple rows then the union of all its annotations is used Matches between gene symbols in the data file and the annotation file is not case sensitive Gene annotation files can either be in an ASCII text format or a GNU zip file of an ASCII text file Below the Gene Annotation Source field is the Cross Reference Source field which controls the entry in the Cross Reference File field Cross references are useful in the case that the naming convention used for genes in the data file is different than what is used in the gene annotation file A cross reference file establishes that two or more symbols refer to the sam
70. n a randomization test where random samples of the same size of the set being analyzed is drawn The number of samples is specified by the parameter Number of samples for multiple hypothesis correction The corrected p value for a p value r is the proportion of random samples for which there is enrichment for any GO category with a p value less than r A Bonferroni correction is faster but a randomization test leads to lower p values Model Profiles Overview Interface EZ All STEM Profiles 1 Clusters ordered based on number of genes and profiles ordered by significance default Alan MONDE NMEA Fittered Gene List Wain Gene Table _ Interface Options Order Profiles By Order CustersBy Compare Figure 11 An example of the main profile overview interface Each box corresponds to a model expression profile Colored profiles have a statistically significant number of genes assigned Clicking on a profile box display detailed information about the profile The profiles and cluster of profiles can be reordered by various criteria by pressing Order Profiles By or Order Clusters By After the STEM clustering algorithm executes the model profile overview interface appears An example of such an interface is shown in Figure 11 Each box corresponds to a different model temporal expression profile The number in the top left hand corner of a profile box is the model profile ID number If the box is colored t
71. ned in detail in the next five subsections The Compare option which allows comparison with a data set from a different experimental condition is explained in Section 7 The disk icon writes out the settings used by STEM in the current analysis These settings can be reloaded into STEM using the Load Saved Settings button on the input interface or by specifying the file as a default file at the command line with the d option In version 1 3 8 if a svg extension is specified when saving the file then an svg file of the main interface is saved instead The svg is produced by STEM using the Batik open source software Pressing the help icon brings up the legend that appears in Figure 12 along with additional help information The last subsection of this section Section 4 6 describes how one can zoom in or out on any portion of the main window p value ofthe enrichment of genes in the selected GO category or query set assigned to profiles of this color only appears if reordering by clusters Colored profiles have a statistically significant number of genes assigned odel profile of log of expression change ratio over time Lower left hand corner contains information relevant to the current reordering of profiles f ordering by significance this will be the p value of number of genes assigned versus expected f ordering by number of genes assigned this will be the number of genes assigned to the profile f ordering
72. nes assigned to the profile that also belong to the selected category and then separated by a semicolon the p value enrichment Below the table are several buttons which give additional criteria to reorder profiles e Profile ID Reorders profiles sequentially from left to right and top to bottom by their ID number the number in the top left corner of the profile box top left Figure 17 Profiles which go down initially will appear first then profiles which hold steady initially and then last will be profiles which go up initially e Significance Reorders profiles based on the p value significance of number of genes assigned to a profile being more than the number of genes expected top right Figure 17 If s genes were assigned to the profile and se genes were expected and a total of t genes passed filter then the uncorrected p value of seeing Sa or more genes assigned to the profile is computed based on a binomial distribution with parameters t and The p value is computed as i 6 9 S The most significant profiles appear to the left on the top row When profiles are reordered based on this option the profile significance p value of a profile will appear in the bottom left hand corner of its profile box e Number of Genes Reorders profiles based on the number of genes assigned to the profile The profiles with the most genes assigned appearing to the left on the top row bottom
73. ng method STEM implements specifically designed for short time series expression data briefly described in Section 1 1 and described in more detail in 3 STEM s implementation of the K means algorithm is discussed in Section 8 Assuming the user selects the STEM Clus tering Method then two options related to selecting temporal model expression profiles appear directly on the main input interface window These options are e Maximum Number of Model Profiles This parameter specifies the maximum number of model profiles that can be selected Model profiles are selected from a larger set of candidate model profiles Candidate model profiles are non constant profiles which start at 0 and increase or decrease an integral number of units that is less than or equal to the value of the Maximum Unit Change in Model Profiles between Time Points See for a discussion on how a set of distinct and representative set of model profiles are selected from the larger set of candidate model profiles If the value of Maximum Number of Model Profiles is set to 0 then there is no hard upper bound on the number of model profiles and the number of model profiles is limited only by the number of candidate profiles and the Maximum Correlation parameter under the Model Profiles section of Advanced Options e Maximum Unit Change in Model Profiles between Time Points This parameter specifies the maximum number of a units a model profile may change between ti
74. nsformation described in Section 3 1 a gene x and center c are T 1 element vectors that can be written as 0 2 1 Uj2 147 and 0 C 1 C 2 Cir respectively The K means algorithm attempts tries to minimize the function 2 2 dy im Cim i l x S m 1 The K means algorithm starts with randomly selected centers where in STEM s implementation the initial centers are chosen to be randomly selected genes The algorithm then iterates between two steps until convergence In one step each gene is reassigned to the cluster of the center to which it is closest In the next step the center of each cluster is recomputed based on the new assignment of genes to clusters The algorithm terminates when no changes in reassignment can be made This algorithm is guaranteed to converge to a local minimum but not a global minimum The algorithm can be repeated for a number of different random starts with potentially a different clustering obtained from each start Only the run with the best scoring final set of clusters is returned The number of random starts is specified in the field Number of Random Starts on the main input interface Increasing this parameter leads to a potentially slightly better clustering at the expense of a slightly longer running time After the K means algorithm executes the main output interface is displayed see Figure 42 This interface is similar to the model profile overview interface d
75. nts of the table can also be saved to a text file using the Save Table button or copied to the clipboard using the Copy Table button Clicking on a row of the gene enrichment table will display a gene table that only includes genes that belong to category of the row and also the set being analyzed For example if a user clicked on the cell cycle row a table such as that in Figure 31 will appear which contains only genes that were assigned to the profile being analyzed that were also annotated as being cell cycle genes Gene List for GO 0007049 cell cycle genes in Profile 9 selected Weight Gene Symbol SPOT 1 00 MCME 1310 1 7642 1 00 CDK 284519180 1 00 MCMS 2856 14160 16303 1 00 MCM3 3790 0 00 10 02 1 00 DKC1 18671 0 00 0 13 1 00 PCNA 9116 1 7976 0 00 0 11 1 00 GMNN 11127 15614 0 00 0 18 1 00 NME1 12422 0 00 0 05 1 00 HELLS 15209 0 00 0 40 1 00 ZAINT 17169 0 00 0 01 1 00 CDCE 18173 0 00 0 11 0 17 1 00 AIF1 18414 0 00 0 18 0 25 1 00 cDC20 18956 0 00 0 10 0 58 0 65 i Copy Table Save Table 3 Copy Gene Names Save Gene Names Chromosome View Figure 31 A table that appears after clicking on the cell cycle row in the gene enrichment table The table only includes genes that were assigned to the profile being analyzed that were also annotated as being cell cycle genes 38 6 Chromosome Viewer STEM allows viewing genes
76. ofile Query GO Table Pressing the Profile Query Gene Table displays a table with all genes assigned to the profile that also belong to the query gene set Pressing the Profile Query GO Table displays a gene enrichment table for just the genes assigned to the profile that are also part of the query set If the profile is part of a non singleton cluster of profiles then two additional buttons will appear the Cluster Query Gene Table and Cluster Query GO Table buttons These buttons are analogous to the Profile Query Gene Table and Profile Query GO Table buttons but are based on all genes in the query set that are assigned to any profile that is part of the profile s cluster of profiles If the profile window was opened by clicking on a row in the main gene table as described in Section 4 1 then a button will appear to plot only the gene of the row that was clicked on This is the Click to plot only gene STAM2 button on the left side of Figure Once the button is pressed the button will be replaced with the 33 Profile 9 Profile 9 Profile 9 0 1 2 3 4 Profile 9 0 1 2 3 4 130 0 Genes Assigned 34 7 Genes Expected p value 0 00 significant 3 mato Change 130 0 Genes Assigned 34 7 Genes Expected p value 0 00 significant GO 0007049 cell cycle GO 0007049 cell cycle 3 Expression Change v v 0 Profile actual size based enrichment uncorrected p value 4 8E 11 18 0 130 0 vs 387 20452 15 5 genes Profile
77. on Sources The table below lists all gene annotation data sets that can be selected under Gene Annotation Source More information about these annotation data sets can be found here http www geneontology org GO current and for the EBI annotations here http www ebi ac uk GOA Subsets of the UniProt annotations for a large number of species provided by the European Bioninformatics Institute EBI can be found here http www ebi ac uk GOA proteomes html and can be used through the User Provided option under the Gene Annotation Source 58 E Gene Location Sources 99
78. optional and if included contains spot IDs If the data file includes the spot IDs column then the field Spot IDs included in the data file on the input interface must be checked otherwise the field must be unchecked The next column or the first column if spot IDs are not included in the data file contain gene symbols If a gene symbol is not available then the field can be left empty or a 0 can be placed in it Both the spot ID field and the gene symbol field may contain multiple entries delimited by a semicolon pipe or comma The sub entries in the field are only relevant in the context of gene annotations described in the next section The remaining columns contain the expression value at each time point ordered sequentially based on time If an expression value is missing then the field should be left empty The first row of the data file contains column headers and each row below the column header corresponds to a spot on the microarray If it is desired that the x axis be scaled proportional to the actual sampling rate then each column header must contain the time at which the experiment was sampled in the same units Each column must be delimited by a tab The tab delimited input data file should be an ASCII text file or a GNU zip file of an ASCII text file A tab delimited text file can easily be generated in Microsoft Excel by choosing Text Tab delimited as the Save as type type under the S
79. particular allows automatic identification of statistically significant sets of genes which are co expressed under both experimental conditions STEM can automatically identify pairs of model profiles one from each experiment for which the intersection of the set genes assigned to the two profiles is statistically significant Suppose there are N genes on the microarray n genes are assigned to a profile in the first experiment n genes assigned to a profile 7 in the second experiment and a total of t genes are in the intersection of the set of genes assigned to profile in the first experiment and profile 7 in the second experiment then the p value of seeing t are more genes in the intersection is computed based on the hypergeometric distribution to be min n n 23 Am 2 5 To specify a comparison data set from the model profile overview screen press the Compare button Pressing this button will open a comparison dialog such as shown in Figure 36 A user can specify the name of comparison data file in the field Comparison Data File Note that STEM requires that the comparison data have the same number of time points as the original data Once a name of a file is specified to view the contents of the file specified in the Comparison Data File press the button View Comparison Data File Pressing the button Comparison Repeat Data will open a repeat dialog window from which to specify repeat data for the comparison e
80. r Profiles By on the model profile overview interface A window such as the one in Figure 15 will then appear The top portion of the window contains a table The table contains 19 EZ All STEM Profiles 1 Profiles ordered based on the actual size based p value of gene enrichment of GO 0007049 cell cycle genes Be SON od a o Bd SO ANN E al sed be he be bA kA A At A AT MY favs fa bl Ed fal EAN BV ARN EVES YT AVN Eo AAA Fittered Gene List Wain Gene Table_ interface Options Order Profiles By Order Custers By Compare Figure 16 The main profile window with the profiles reordered by the actual size based p value enrichment for cell cycle genes The two numbers in the bottom left hand corner of each profile box are the number of cell cycle genes assigned to that profile and then separated by a semicolon is the p value of the gene enrichment for cell cycle genes in the profile an entry for every category containing at least one gene passing filter The first two columns of this table are the category ID and category name The third column contains the minimum p value of the gene enrichment of genes for that category for any profile computed based on the profile s actual size The fourth column also contains the minimum p value of the gene enrichment of genes for that category for any profile but computed based on a profile s expected size The actual size of a profile is the number of genes assign
81. richment Advanced Options Filtering Model Profiles Clustering Profiles Gene Annotations GO Analysis Minimum GO level 3 Minimum number of genes aH Number of samples for randomized multiple hypothesis correction soo Multiple hypothesis correction method for actual size based enrichment gt Bonferroni Randomization Figure 10 The above panel is used to specify options for the Gene Ontology enrichment analysis analysis Note that categories that appear in a gene annotation file even if not part of the official Gene Ontology are also included in a GO analysis The parameters included on this panel are as follows e Minimum GO level Any GO category whose level in the GO hierarchy is below this parameter will not be included in the GO analysis The categories Biological Process Molecular Function and Cellular Component are defined to be at level 1 in the hierarchy The level of any other term is the length of the longest path to one of these three GO terms in terms of the number of categories on the path This parameter thus allows one to exclude the most general GO categories e Minimum number of genes For a category to be listed in a gene enrichment analysis table described in Section 5 2 the number of genes in the set being analyzed that also belong to the category must be greater than or equal to this parameter e Number of samples for randomized multiple hypothesis correction This parameter
82. rmute Time Point O is checked then the permutation test permutes all time points including time point 0 when computing the expected number of genes assigned to a profile In this case STEM finds profiles with significantly more genes assigned than expected if all the input columns had been randomly reordered If this box is not checked the permutation test permutes all time points except for time point 0 In this case STEM finds profiles with more genes assigned than expected if all the columns except for the first column had been randomly reordered Note that if No normalization Add 0 was selected on the main input interface the column of added Os is considered the first input column Permuting time point O is generally preferred since only this test takes into account significant changes that occur between time point O and the immediate next time point However in some cases based on experimental design a gene s expression value before transformation at time point 0 is expected to be known more accurately than the other time points and because of this asymmetry as explained below not permuting time point 0 can also be useful The time point 0 expression value before transformation can be known more accurately than other time points in a two channel experiment where the time point 0 sample is used as the reference sample or in a single channel experiment where extra repeats were done for time point 0 In these experiments ther
83. s used to specify options for grouping statistically significant model profiles The two parameters on the clustering profile panel shown in Figure 8 control the grouping of significant model profiles into clusters The parameters on this panel are again only relevant to the STEM clustering method and not AK means clustering The parameters control how similar two model profiles must be if they are grouped together The two parameters are as follows e Minimum Correlation Any two model profiles assigned to the same cluster of profiles must have a correla tion above this parameter s value Increasing this value will lead to more clusters with fewer model profiles per cluster while decreasing the value will lead to fewer clusters with more model profiles per cluster e Minimum Correlation Percentile If there is repeat data selected to be from Different time periods then this parameter specifies that any two model profiles assigned to the same cluster of profiles must have a correlation in their expression greater than the correlation of this percentile in the distribution of gene expression correlations between the repeats For instance if this parameter value is 0 5 then any two model profiles assigned to the same cluster will have a correlation greater than the median correlation of gene expression correlations between the repeats This parameter allows clustering of model profiles to be dependent on the noise in the data
84. section it is possible to reorder the cluster of 20 profiles by gene enrichment for any category that appears in the table by clicking on the row of the category The gene enrichment for a cluster of profiles is always computed based on the number of genes assigned to the cluster As with reordering profiles one can reorder a cluster of profiles by a user defined gene set by pressing the button Define Gene Set and then using a dialog such as appeared in Figure The expected number of genes in a cluster being analyzed is not well defined since the clusters of profiles are defined based on the data When clusters of profiles are reordered based on a category or user defined gene set the p value enrichment for the cluster of profiles appears in the top right hand corner of the profile box of each profile that is part of the cluster The number of genes of the category assigned to the profile and the p value enrichment still appear in the lower left hand corner of the profile box The Default Order button returns the profiles to their initial ordering as explained in Section 4 3 4 5 Interface Options Interface Options Display policy on main interface 2 Do not display genes Display only selected genes O Display all genes Change Color of Genes Display Model Profile Display Profile ID Display details when ordering Y axis scale for genes on main interface should be O Gene specific Profile
85. sed On the right is the same window after the button Click to plot only genes in intersection is pressed As mentioned in Section 4 3 a user can reorder the profiles on the model profile overview screen based on gene enrichment for a user defined set After the Compare button on the comparison dialog has been pressed the user defined gene set can be defined based on sets of genes assigned to profile s in the other data set This feature thus allows a user to visualize how a set of genes which all had the same expression profile s in one experiment responded in another experiment under different conditions On the left of Figure 40 is the window to define a gene set by which to reorder the original data set model profiles notice that the field Profile ID in Comparison Set is active On the right of Figure 40 is the window to define a gene set by which to reorder the comparison data set model profiles notice the field Profile ID in Original Set is active Pressing the Select button selects those genes from the other experiment assigned to the profile of the ID displayed Note that one can select genes from multiple profiles since selecting an additional profile ID does not clear any currently selected genes To create a gene set based on all the genes filtered in the other experiment set the profile ID value to 1 and then press select genes 46 Define Gene Set Em ix Define Gene Set Sele Gene Name In Gene Sety
86. ssion 0 8 Change_should_be_based_on Maximum Minimum Difference From 0 Difference From 0 Pre filtered_Gene_File HModel Profiles Maximum _Correlation 1 0 54 Number_of_Permutations_per_Gene 50 Maximum_Number_of_Candidate_Model_Profiles 1000000 Significance_Level 0 05 Correction_Method Bonferroni False Discovery Rate None Bonferroni Permutation_Test_Should _Permute_ Time _Point_0 true Clustering Profiles Clustering_Minimum_Correlation 0 7 Clustering_Minimum_Correlation_Percentile 0 0 Gene Annotations Category_ID_File Include_Biological_Process true Include_Molecular_Function true Include_Cellular_Process true Only_include_annotations_with_these_evidence_codes Only_include_annotations_with_these_taxon_IDs G0 Analysis Multiple_hypothesis_correction_method_enrichment Bonferroni Randomization Randomization Minimum_GO_level 3 GO_Minimum_number_of_genes 5 Number_of_samples_for_randomized_multiple_hypothesis_correction 500 Interface Options Gene_display_policy_on_main_interface Do not display Display only selected Display all Do not display Gene_Color R G B 204 51 0 Display_Model_Profile true Display_Profile_ID true Display_details_when_ordering true Show_Main_Y axis_gene_tick_marks false Main_Y axis_gene_tick_interval 1 0 Y axis_scale_for_genes_on_main_interface_should_be Gene specific Profile specific Global Profile specific Scale_should_be_based_on_only_selected_genes true Y axis_scale_on_detai
87. t were expected to be part of the set being analyzed This value will depend on whether an actual size or expected size profile enrichment analysis is being conducted e Genes Enriched The difference between Genes Assigned and Genes Expected e p value The uncorrected p value of seeing this many or more genes from this category assigned to the set of genes being analyzed This p value will depend on whether an actual size or expected size enrichment analysis is being conducted See Section 4 3 for a discussion on how the p value is computed e Corrected p value The p value corrected for testing a large number of GO categories If the enrichment is based on a set s actual size and Randomization is selected as the value for Multiple hypothesis correction method for actual size based enrichment the corrected p value is computed based on a randomization test If the enrichment is computed based on a set s expected size or Bonferroni is selected as the value for Multiple hypothesis correction method for actual size based enrichment then the corrected p value is computed based on a Bonferroni correction See section 3 3 5 for a discussion on these two methods for correcting GO enrichment p values e Fold new in 1 3 7 fold enrichment that is the number of genes assigned divided by expected A gene enrichment table can be sorted by any column in ascending or descending order by clicking on the column header The conte
88. the row was assigned This new window is described in Section 5 An option will also appear on the newly opened window to plot only the expression of the gene of the selected row The columns of the table are as follows e Selected An entry in this column contains a Yes if the gene of the row is part of a category or gene set by which the profiles are ordered otherwise the field is empty e Gene Symbol This column contains the gene symbols The name for this column is read from the header in the data file e Spot ID An entry in this column contains a list of spot IDs of spots which contain the gene of the row The entries are delimited by a The header for this column is read from the data file if spot IDs are included in the data file e Profile s Assigned The ID of the model profile or in the case of a tie the profiles for which the gene s expression pattern most closely matched and thus to which the gene was assigned e Time Point columns The time series of gene expression levels for the gene after any selected transformation Log normalize data Normalize data or No normalization add 0 The header for these columns are read from the data file This table as all tables in STEM can be sorted by any column Click once on a column header to sort the table in ascending order by that column s values Click twice on the column header to sort the table in descending order and a third ti
89. ult Order Define Gene Set Copy Table Save Table Figure 19 The dialog box through which the ordering of cluster of profiles can be changed ES All STEM Profiles 1 HEr Clusters and then profiles ordered based on the actual size based p value of gene enrichment of GO 0007049 cell cycle g os o ES AAN VEL id bed ho he bA ba SMA a MAI Ba bw by Ea ENEMY ALIAS MENE Figure 20 Cluster of profiles ordered based on enrichment for cell cycle genes When a user presses the Order Clusters By button on the main profile window a dialog box such as in Figure 19 appears This window is a simplified version of the window that appears when a user presses Order Profiles By Through this dialog box a user can reorder the cluster of profiles A cluster of profiles is either a singleton profile or a group of profiles which all have a statistically significant number of genes assigned and are all similar to each other as defined based on parameters on the Clustering panel under Advanced Options Profiles of the same cluster have the same color on the main profile window When clusters of profiles are reordered profiles of the same cluster are kept next to each other as appears in Figure 20 Also when reordering clusters of profiles all non statistically significant profiles are also reordered but always appear after the cluster of statistically significant profiles As with reordering profiles discussed in the previous sub
90. will be an option to plot just the gene selected The image on the left initially appears but pressing Click to plot only gene STAM2 replots with only the gene STAM2 390 9 1 Gene Table Gene Table for Profile 41 selected Weight Gene Symbol Oh 3h 1 00 C200RF16 16 1 00 0 SPOT_33 33 1 00 CTH 43 0 00 0 85 0 93 1 62 1 00 CBA 48 0 00 0 56 0 66 1 02 1 00 D SPOT_114 114 0 00 0 26 0 72 0 68 1 00 15620 293 19932 0 00 0 06 0 20 0 95 1 00 C200RF18 483 0 00 0 16 0 04 0 40 1 00 0 SPOT_556 556 0 00 0 29 0 65 0 61 1 00 CLDN4 582 7384 8896 22190 0 00 0 05 0 85 1 00 1 00 0 SPOT_635 635 0 00 0 40 0 58 0 84 1 00 SPRR1B eS 0 00 0 78 0 74 1 57 1 00 CCL4 758 0 00 1 16 2 02 2 87 1 00 PDGFB 968 11922 15163 0 00 0 14 1 36 2 08 1 00 PDK3 1029 0 00 0 10 0 29 0 89 1 00 PSCD4 1056 0 00 0 66 0 41 0 73 Copy Table Save Table B Copy Gene Names Save Gene Names Chromosome View Figure 29 The above window is a gene table for a profile From a model profile details interface window a user has the option to open or more gene tables such as the table that appears in Figure As discussed in the beginning of the section which genes appear in the table depends upon which button was pressed to open the table A gene table has the following columns Selected This is the same column as in the main gene table An entry in this column contains a
91. xperiment This dialog window appears in Figure 4Jand was described back in Section 3 1 Below the Comparison Repeat Data button are two parameters e Maximum uncorrected intersection p value The maximum uncorrected intersection p value for the inter section to be of interest e Minimum number of genes in intersection The minimum number of genes in the intersection of the set of genes assigned to two profiles for the intersection to be of interest Pressing the yellow Compare button will launch two new windows One of the windows that is launched contains the model profile overview screen for the comparison data set This is the same interface that is described in Section The other window that appears is the main comparison window an example of which is shown in 43 Comparison Significant Intersections Original Set Profiles Original Set Profiles w un E E a oa t t 197 N c oO oO w un 2 Q a E E a O O O Figure 37 The main comparison window If a profile appears to the right of the yellow bar then a significant number of genes assigned to the profile were also assigned to the profile to the left of the yellow bar in the other experiment Figure The window shows all profile pairs containing a gene intersection satisfying the size and p value constraints specified on the comparison dialog The interface layout has two halves with a blue bar separating the two halves there is no

Short Time-series Expression Miner (v1.3.8) User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents