Home

WGCNA user manual

1. User should specify the Expression Data file location first It s a comma or tab delimited file where rows are genes probe sets and columns correspond to microarray samples The first column should contain gene identifiers GeneName Person Person Persons Persond Persons Persone Person Persona Persond Personi G 0 346676 1 161723 0 66371 1 420032 1 937326 0 66351 0 296692 0 18615 2 44964 1 293704 G2 2 202590 0 664077 0 365667 0 19999 0 16648 0 51231 0 63656 0 95634 1 9177 2 301971 Ga 1 464672 1 07043 0 761256 0 220236 4 31713 0 204570 1 59509 2 424614 3 100237 0 566301 54 0 425564 2 21047 1 115272 0 400406 3 166945 0 545041 1 54641 1 46429 4 21591 1 162176 G5 0 523249 2 227033 0 34264 0 2664 1 469669 0 735608 0 21567 0 93992 3 70007 1 242306 GE 0 46661 1 038671 0 22651 0 70261 3 20909 0 956619 1 169791 0 266063 1 56066 0 249675 Example expression data In order to screen for genes or modules that are biologically significant WGCNA needs to define a gene significance measure Option 1 based on a correlation with a microarray sample trait T that corresponds to a column in the trait file as Sample Trait Data file Then GeneSignificance 1 lcor Gene 1 T l the trait could be a binary outcome case control status or a quantitative outcome body weight Option 2 based on pre defined gene significance measure that corresponds to a column in the gene information file It must contain the
2. File Tools Help Module significance p value 0 16 grey brown yelka turquoise ble y o m gt a a z a o 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection 10 List Modules Static Height F Branch Cutting Static Height Cutoff 0 95 M erge 2 Dynamic cut parameters preter rag ee uto Dynamic Height Branch Cutting Max Height 0 to 1 0 99 Merge is Deep Split Level 7 Choose 0 1 2 or 3 Dynamic Hybrid Branch Cutting fo Max PAM Distance negative to disable 0 95 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 23 559 mg Module Merge Module Detection Min Module Size Plot Custer Tree Plot ME Pairs mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 zel aael EL mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 esi deel o EL Once modules are defined user can generate module significance plot by clicking plot module significance And the plot cluster tree button offers an easy way to re draw hierarchical tree and module definitions without conducting module detection again Fie Tools Help Relationship between module eigengenes D4 00 G2 04 23 01 01 03 03 Q1 QI 03 i cy Le ard Pade ates bis
3. the slope of the regression line between log p k and log k should be negative typically smaller than 2 In practice we find the relationship between R 2 and B is characterized by a saturation curve In most applications we use the lowest power B where saturation is reached As a caveat we mention that sometimes scale free topology cannot be reached for reasonably low values of B say smaller than 20 For example severe array outliers or globally distinct groups of arrays may lead to strong correlations between the expression profiles and very large co expression modules In this case we simply recommend going with the default choice of B for an unsigned network B 6 for a signed network B 12 lt D ao gt el Oo o D n a 5S 2 5 2 O O Do om gt gt O 8 O s z g LO LL b o N O D O O 5 10 15 20 5 10 15 20 Soft Threshold power Soft Threshold power Figure a Scale free topology fit R 2 y axis as a function of different powers In many but not all applications one observes a saturation type curve Here we would choose a power of 6 since the saturation level is reached at this point The analysis is highly robust with respect to the choice of the power b Mean connectivity y axis versus the power The higher the power the lower the mean connectivity Page 6 of 20 WGCNA User Manual Module detection We use average linkage hierarchical clustering coupled
4. Selection blie module cor 0 04 Manual Gene Selection Choose the Module Plot Gene Significance grey ivs Intramodular Kk brown yellow Choose the Genes ll Automatic Gene Selection turquoise GS gt o2 Auto Gene Selection K Kmax gt 0 5 Mel Select Genes Plot Module Heatmap Output Gene List a o c C nh a a a 385 48 15 348 204 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 23 559 181 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 23 559 181 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 23 559 181 The user can save any plot in the display panel to a local file using Save Image function under File menu The image format is emf EMF Enhanced Meta File is a vector based image format designed for and popularized by Microsoft Windows which could be easily transfer to other desirable format To convert EMF files into PDF format the user can insert the EMF file into Word and then print it using Acrobat PDFWriter to crate a PDF file If the user has Adobe Illustrator the user can then edit the PDF file and save it to EPS format Page 18 of 20 WGCNA User Manual How to access help File Tools 56m Get Latest Version STE 1 Load Data 2 Pre
5. WGCNA User Manual Short glossary of network concepts Term Coexpression network Module Connectivity Intramodular connectivity KIN Module eigengene Eigengene significance Module Membership also known as eigengene based connectivity kME Definition We define coexpression networks as undirected weighted gene networks The nodes of such a network correspond to gene expressions and edges between genes are determined by the pairwise Pearson correlations between gene expressions By raising the absolute value of the Pearson correlation to a power B gt 1 soft thresholding the weighted gene coexpression network construction emphasizes large correlations at the expense of low correlations Specifically a jj cor x x P represents the adjacency of an unsigned network Optionally the user can also specify a signed co expression network where the adjacency is defined as follows a j 10 5 0 5 cor x x JP Modules are clusters of highly interconnected genes In an unsigned coexpression network modules correspond to clusters of genes with high absolute correlations In a signed network modules correspond to positively correlated genes For each gene the connectivity also known as degree is defined as the sum of connection strengths with the other network genes k gt 4a iu In coexpression networks the connectivity measures how correlated a gene is with all other network genes Intramodular con
6. also provides the file MEResults which reports the values of the module eigenes columns for different arrays rows The top two rows in the file MEResults reports the elgengene significance of each module 2 Analysis using gene significance data Page 17 of 20 WGCNA User Manual Output contains two different files 1 LazyGenelist csv which contains results for each gene rows 2 MEResults csv which contains results for each array sample LazyGenelist csv contains the following columns Module membership information see the columns MM blue etc For each gene and each automatically detected module WGCNA outputs a module membership MM value For example if a gene has an MMblue value close to or 1 the gene is assigned to the blue module Module colors are assigned according to module size turquoise denotes the largest module blue next then brown green yellow etc The color grey is reserved for non module genes GS Weighted is the weighted network estimate of the corresponding pre defined GS measure GS Weighted takes account of module membership information and the hub gene significance measure HGS of each module The file MEResults reports the module eigengenes columns across the arrays rows The top two rows in the file MEResults report the hub gene significance of each module Save image ji Tools Help 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene
7. correlation between log p k and log k 1 e the model fitting index R 2 of the linear model that regresses log p k on log k If R 2 of the model approaches 1 then there is a straight line relationship between log p k and log k Many co expression networks satisfy the scale free property only approximately Page 5 of 20 WGCNA User Manual Most biologists would be very suspicious of a gene co expression network that does not satisfy scale free topology at least approximately Therefore a soft threshold power B in a i j Icor x i x j IP that give rise to a network that does not satisfy approximate scale free topology should not be considered There is a natural trade off between maximizing scale free topology model fit scale free fitting parameter R 2 and maintaining a high mean number of connections High values of B often lead to high values of R 2 But the higher the power fp the lower is the mean connectivity of the network These considerations have motivated us to propose the following scale free topology criterion for choosing the power B Only consider those powers that lead to a network satisfying scale free topology at least approximately e g R42 gt 0 80 In addition we recommend that the user take the following additional considerations into account when choosing the adjacency function parameter First the mean connectivity should be high so that the network contains enough information e g for module detection Second
8. gene ontology information to assess their biological plausibility it is not required Because the modules may correspond to biological pathways focusing the analysis on intramodular hub genes or the module eigengenes amounts to a biologically motivated data reduction scheme Because the expression profiles of intramodular hub genes are highly correlated typically dozens of candidate biomarkers result Although these candidates are statistically equivalent they may differ in terms of biological plausibility or clinical utility Gene ontology information can be useful for further prioritizing intramodular hub genes Examples of biological studies that show the importance of intramodular hub genes can be found reported in Horvath et al 2006 Carlson et al 2006 Gargalovic et al 2006 Ghazalpour et al 2006 Miller et al 2008 Analysis overview Construct a network w Rationale make use of interaction patterns between genes Identify modules MWIN Rationale module pathway based analysis I Relate modules to external information me Array Information Clinical data SNPs proteomics Gene Information gene ontology EASE IPA li g g Rationale find biologically interesting modules Find the key drivers in nteresting modules Tools intramodular connectivity correlation 0 56 pvalue 2 20 16 highly related to module membership gene significance Rationale experimental validation therapeutics biomarkers Page 2 of 20
9. gene and each module WGCNA outputs a module membership MM value MM values close to I or 1 suggest that the gene is a member in the respective Page 11 of 20 WGCNA User Manual module By using I m feeling lazy WGCNA will include all the available genes in its analysis regardless of their variance or connectivity Regarding to the output of the automatic analysis please refer to the 5 Gene Selection section for details 2 the manual WGCNA analysis allows the user to proceed in a step wise fashion to define modules and significant genes In general we recommend the manual analysis but it has a limitation only 3600 genes can be included for the module definition 3 Network Construction Fie Tools Help Help on Step 3 Network Construction Specify whether you want to use a signed or an unsigned gene co expression similarity measure A signed network defines modules as Signed Network clusters of pos tively correlated genes An unsigned network defines modules based on the absolute value of the correlation coefficient Specify a power soft threshold larger than or equal to 1 Raising the Beene please choose Use Adjacency co expression similarity to this power would result in a weighted co Power Selection Power 6 power lt 30 instead of TOM expression network The default power is 6 but can also use the for Clustering scale free topology to pick a power Zhang and Horvath 2005 To use the scal
10. module membership measure K measures how centrally located the gene is inside the module To select genes you can set the threshold parameters and manually output the genes that meet the selection criterion Or you can choose aitomatic gene selection procedure which selects genes based on their gene significance and their membership to significant modules 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection Plot Gene Significance vs Intramodular K Choose the Genes GS gt a5 K Kmax gt 0 5 Select Genes Output Gene List Manual Gene Selection Choose the Module ll Automatic Gene Selection Auto Gene Selection Help Plot Module Heatmap mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 74 zal ats E mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 74 ae aael o EN mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 4 ae LSE ET blue blue blue Based on the userr module significance analysis from the previous step please choose a module of interest first File Tools Help brown module heatmap and eigengene O01 00 O14 O2 shld iat Bi 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection Plot Gene Significanc
11. same gene sets as in the expression data Page 9 of 20 WGCNA User Manual In order to proceed the user needs to load a trait file or a gene info file or both In the next step the user can choose a column of the trait file as sample information trait or a column of the gene info file as pre defined gene significance measure seneName Outcome Outcomes Person 0 07 0 732 Persone 2 19 0 234 Persona 0 29 Ufo Person 0 07 0 43 Persona 0 26 0 52 Personb 0 79 0 79 Person 0 51 0 28 Example trait data Gene information Data file can contain additional gene information like gene names chromosome location etc that user wants to keep in the analysis GenelD GeneName Genesignificancealue G1 ATG O f4 G2 IL10 0 13 GJ OPTR 0 46 Ga Lb 0 26 Example gene information data PathwayR elated mRNA Processing Immune Response immune Response Please follow the data format as in sample files WGCNA can take either comma or tab delimited text file as input After loading the expression data and trait data or gene information data user should click the Next gt button in order to move to data Preprocess step 2 Data preprocessing and Im Feeling Lazy analysis File Tools Help Help on Step 2 Pre processing Specify either a microarray sample trait or a pre defined gene significance measure so that WGCNA knows which gene significance GS measure should be used Next decide on whether you
12. that the default parameters of the automatic lazy analysis work quite well in many real applications there is a danger that a module may be an artefact e g due to an array outlier The manual analysis allows the user to interact with the program regarding module detection and gene Selection but it can only deal with relatively few genes on our laptop computer fewer than 3600 genes Since module detection 1s computationally intensive the user can filter genes based on the variance across the microarrays and the user can select the most connected genes among the most varying genes Since module genes tend to be highly connected restricting the analysis to the most connected genes is a reasonable gene filtering criterion when it comes to module detection At the end of the analysis WGCNA outputs module membership measures which are defined for all genes in the input data not just the 3600 most connected genes Fie Tools Help 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection H e lp O n Ste p 2 P re pro ce SS n g Sample Trait Outcome Selection Gene Significance Selection Specify either a microarray sample trait or a pre defined gene a y N Biss significance measure so that WGCNA knows which gene l truemodule C significance GS measure should be used as trait to define the GS measure as PIel SiqnalGenelndicator IN Genelnfo Cor Standard N ide o ou want to pro ith th
13. want to proceed with the automatic or the manual WGCNA analysis The automatic analysis ImFeelingLazy uses default parameters for finding modules and significant hub genes For each gene and each module WGCNA outputs a module membership a value M values close to 1 or 1 suggest that the gene is a member in the respective module The manual WGCNA poh yis allows the user to proceed in astep ul wis e fashion to define moc i recommend the manual analysis but it has a limitation only 3600 genes can be included for the module definition Therefore we implement two approaches for restricting genes i based on variance across samples ii based on whole network connectivity Since module genes tend to be highly connected little is lost by restricting the analysis to highly connected genes es and significant genes In general we Sample Trait Outcome Selection Choose y N as trait to define the GS measure Gene Significance Selection Choose as pre defined GS measure Automatic WGCNA Manual WGCNA Expression Data Filter l m feeling lazy Hel Keep 8000 most varing genes of those 3600 lt 3600 most connected genes keep Automatically detects me aen a modules and significant hub genes based on all genes Next gt Expression Data Loaded Found 50 samples and 3000 genes Trait Data Loaded Found 1 available traits for analysis Geneinfo Data Loaded First the user needs to specif
14. with a gene dissimilarity measure to define a dendrogram cluster tree of the network Once a dendrogram is obtained from a hierarchical clustering method we choose a height cutoff to arrive at a clustering Modules correspond to branches of the dendrogram WGCNA implements two network dissimilarity measures The default choice is the topological overlap matrix based dissimilarity measure Ravasz et al 2002 Zhang and Horvath 2005 Li and Horvath 2006 Yip and Horvath 2007 The use of topological overlap serves as a filter to exclude spurious or isolated connections during network construction The topological overlap dissimilarity is used as input of hierarchical clustering Yin Ly 1 TOM _ ___ DistTOM 1 TOM e Generalized in Zhang and Horvath 2005 to the case of weighted networks e Generalized in Yip and Horvath 2006 to higher order interactions As alternative dissimilarity measure we also define dissA i j 1 aQ j G e 1 minus the adjacency matrix This alternative measure is computationally much faster than the topological overlap measure and often leads to approximately similar modules WGCNA defines modules by cutting pruning branches off the dendrogram A common but inflexible method uses a static constant height cutoff value this method exhibits suboptimal performance on complicated dendrograms Therefore WGCNA also implements dynamic branch cutting methods for detecting clusters in a dendrogram dependi
15. zs 3 ah 7 gt 2 s Ca aes S A des ILo io i I8 03 01 Qi 02 00 02 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection jio List Modules Serato da Static Height Cutoff 9 45 Merge 2 Dynamic cut parameters Auto H cut u 2 Dynamic Height i ies Branch Cutting eee Na aa eoe 0 20 Deep Split Level Choose 0 1 2 or 3 Dynamic Hybrid Branch Cutting Next gt ext Module Merge Module Detection Min Module Size o Plot Module Significance Max PAM Distance negative to disable 0 95 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 4 are tafe ILE mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 23 559 181 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 4 zel SS lel And the plot ME Pairs function plots the relationship between module eigengenes It s useful for studying module similarities and help merging similar modules Message the module eigengenes first PC of different modules may be highly correlated WGCNA can be interpreted as a biologically motivated data reduction scheme that allows for dependency between the resulting components Compare this to principal component analysis that would impose orthogonality between th
16. 3 Network Construction 4 Module Detection 5 Gene Selection bhie module cor 0 04 Manual Gene Selection Choose the Module Plot Gene Significance grey vs Intramodular K brown yellow Choose the Genes ll Automatic Gene Selection turquoise GS gt o2 Auto Gene Selection Hel K Kmax gt 0 5 fon gene Significance Plot Module Heatmap Output Gene List grey brown yellow turquoise blue 163 74 zel Se mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 ge o aael EN No gene meets the criteria geneid K GS truemodule SignalGenelndicator Genelnfo Cor Standard 1 Gene638 14 229 0 207 blue 0 0 2067922 2 Gene639 17 886 0 241 blue 0 2406203 3 Gene64115 1330 266 blue 0 2662961 4 Gene646 14 191 0 258 blue 0 2584501 Page 16 of 20 WGCNA User Manual To select genes the user can set the threshold parameters and manually output the genes that meet the selection criterion The user can also save the list to a file Or the user can choose automatic gene selection procedure which selects genes based on their gene significance and their membership to significant modules In gene selection step user can generate heat map for each specific module The sample order is exact the same as in expression data Auto Gene Selection function allows user to obtain a gene lists with most significant genes ranked by putting both gene significance and intr
17. Genetics and Molecular Biology Vol 4 No 1 Article 17 Page 20 of 20
18. Process 3 Network Construction 4 Module Detection 5 Gene Selection module cor 0 04 About WGCNA Manual Gene Selection Choose the Module Plot Gene Significance i vs IntramodularK Choose the Genes turquoise GS gt 0 2 K Kmax gt 0 5 Select Genes Plot Module Heatmap Output Gene List ll Automatic Gene Selection Auto Gene Selection Help a o c y c ma nm a Cc a 385 48 15 348 204 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 elo was ET mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 eL o masl EN mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 4 ye datas e If needs to read help for current step the user can display the help by click the Show Help menu under Help Network function library update WGCNA utilizes a set of network functions which is included in the WGCNA package called NetworkFunctions WGCNA library Since we keep updating the library please download the latest version from our website The user can check the versions of both WGCNA program and its network library using About WGCNA menu under Help In order to update the network library the user can simple download the latest version and save it to C WGCNA folder
19. Recovery from unexpected errors If WGCNA crashes due to any unexpected error please use Windows Task Manager to kill the R server process whose name is displayed as STATCO 1 EXE Otherwise the user may encounter error like Unable to create metafile when the user re run WGCNA Page 19 of 20 WGCNA User Manual Information References e Albert R Jeong H Barabasi AL 2000 Nature 406 378 382 e Carlson MRJ Zhang B Fang Z Mischel P Horvath S Nelson SF 2006 Gene Connectivity Function and Sequence Conservation Predictions from Modular Yeast Co Expression Networks BMC Genomics 2006 7 40 3 e Fuller TF Ghazalpour A Aten JE Drake TA Lusis AJ Horvath S 2007 Weighted Gene Co expression Network Analysis Strategies Applied to Mouse Weight Mamm Genome 18 6 463 472 e Dong J Horvath S 2007 Understanding Network Concepts in Modules BMC Systems Biology 2007 June 1 24 e Gargalovic PS Imura M Zhang B Gharavi NM Clark MJ Pagnon J Yang W He A Truong A Patel S Nelson SF Horvath S Berliner J Kirchgessner T Lusis AJ 2006 Identification of Inflammatory Gene Modules based on Variations of Human Endothelial Cell Responses to Oxidized Lipids PNAS 22 103 34 12741 6 e Ghazalpour A Doss S Zhang B Wang S Plaisier C Castellanos R Brozell A Schadt EE Drake TA Lusis AJ Horvath S 2006 Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse Weight PloS Genetics Volum
20. Rete ceases ama A E AE eae seems 20 Page 1 of 20 WGCNA User Manual Background information WGCNA begins with the understanding that the information captured by microarray experiments 1s far richer than a list of differentially expressed genes Rather microarray data are more completely represented by considering the relationships between measured transcripts which can be assessed by pair wise correlations between gene expression profiles In most microarray data analyses however these relationships go essentially unexplored WGCNA starts from the level of thousands of genes identifies clinically interesting gene modules and finally uses intramodular connectivity gene significance e g based on the correlation of a gene expression profile with a sample trait to identify key genes in the disease pathways for further validation WGCNA alleviates the multiple testing problem inherent in microarray data analysis Instead of relating thousands of genes to a microarray sample trait it focuses on the relationship between a few typically less than 10 modules and the sample trait Toward this end it calculates the eigengene significance correlation between sample trait and eigengene and the corresponding p value for each module The module definition does not make use of a priori defined gene sets Instead modules are constructed from the expression data by using hierarchical clustering Although it is advisable to relate the resulting modules to
21. WGCNA User Manual WGCNA User Manual for version 1 0 x A systems biologic microarray analysis software for finding important genes and pathways The WGCNA weighted gene co expression network analysis software implements a systems biologic method for analyzing microarray gene expression data gene information data and microarray sample traits e g case control status or clinical outcomes WGCNA can be used for constructing a weighted gene co expression network for finding co expression modules for calculating module membership measures and for finding highly connected intramodular hub genes WGCNA facilitates a network based gene screening method that can be used to identify candidate biomarkers or therapeutic targets The gene screening method integrates gene significance information e g correlation between gene expression and a clinical outcome and module membership information to identify biologically and statistically plausible genes The software has a graphic interface that facilitates straightforward input of microarray and clinical trait data or pre defined gene information The software can analyze networks comprised of tens of thousands of genes and implements several options for automatic and manual gene selection network screening To cite the software please use Zhang and Horvath 2005 Horvath et al 2006 and Langfelder et al 2007 Table of Contents Back round IN Or Alt Oi cassis cetsstre yasetsd N E uaeiea i
22. a modular connectivity into consideration Gene selection output files Both I am feeling lazy and auto gene selection functions generate two files as output They are of the same format 1 Analysis using Trait data Output contains two different files 1 LazyGenelist csv which contains results for each gene rows 2 MEResults csv which contains results for each array sample LazyGenelist csv contains the following columns Module membership information see the columns MM blue etc For each gene and each automatically detected module WGCNA outputs a module membership MM value E g if a gene has an MMblue value close to 1 or 1 the gene is assigned to blue module Module colors are assigned according to module size turquoise denotes the largest module blue next then brown green yellow etc The color grey is reserved for non module genes Cor Weighted denotes the weighted network estimate of the standard correlation cor x 1 y between the i th gene and the outcome y It use of module membership info as well as the module eigengene significance Analogously p Weighted is a weighted version of a p value and q Weighted is a weighted version of a q value local false discovery rate see the qvalue library in R Z Weighted is a weighted version of the Fisher Z transform of a correlation coefficient In order to learn about which module eigengenes affect the calculation of these measures WGCNA
23. ation test p value for module membership denoted by PvalueMMblue The module membership measure can be defined for all input genes irrespective of their original module membership It turns out that the module membership measure is highly related to the intramodular connectivity kIN Highly connected intramodular hub genes tend Page 3 of 20 WGCNA User Manual Term Definition to have high module membership values to the respective module This loosely defined term is used as an abbreviation of highly connected gene Hub gene ee ans oa re By definition genes inside coexpression modules tend to have high connectivity To incorporate external information into the co expression network we make use of gene significance measures Abstractly speaking the higher the absolute value of GS i the more biologically significant is the 1 th gene Examples GS i could encode pathway membership e g if the gene is a known apoptosis gene and 0 otherwise knockout essentiality or the correlation with an external microarray sample trait A gene significance measure could also be defined by minus log of a p value The only requirement is that gene significance of 0 indicates that the gene is not significant with regard to the biological question of interest The GS can take on either positive or negative When the user specifies a microarray sample trait y e g case control status or a quantitative outcome WGCNA defines the gene significanc
24. distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 4 ae tafe EL WGCNA implements three different approaches for defining modules based on a hierarchical clustering tree of the genes All of the three methods required a minimum module size setting 1 The static branch cutting method simply defines modules as branches that lie below the height corresponding to the y axis of the cluster tree 2 The dynamic height branch cutting method automatically chooses a height cut off for each branch based on the shape of each branch Langfelder et al Bioinformatics 2007 3 The dynamic hybrid method is a hybrid between the dynamic method and partitioning around medoid PAM clustering The button Plot Module Significance plots the average gene significance for each module The user can go back to step 2 to specify a different gene significance GS measure in the gene info file or pick a different clinical trait column in the sample trait file In this way the user can Page 13 of 20 WGCNA User Manual study module significance based on different significance measurements without re calculating TOM needs to keep Re Calculate Adjacency TOM box unchecked The module merge functions allow the user to manually or automatically merge closely related modules User can go ahead to merge close modules using Merge 2 modules function Besides Auto Module Merge offers an automatic module merging process
25. e vs Intramodular K Choose the Genes GS gt 0 5 K AKmax gt 0 5 Select Genes Output Gene List Manual Gene Selection Choose the Module ll Automatic Gene Selection Auto Gene Selection Help mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 4 aal ml ILS mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 74 23 559 161 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise 163 74 aa tale blue blue blue 1 The user can use a heatmap button to plot the standardized expression values of the module genes rows across the arrays columns The top row shows the heatmap of the brown module genes rows across the microarrays columns The lower row shows the corresponding module eigengene expression values y axis versus the same microarray samples Note that the module eigengene takes on low values in arrays where a lot of module genes are under expressed green Page 15 of 20 WGCNA User Manual color in the heatmap The ME takes on high values for arrays where a lot of module genes are over expressed red in the heatmap ME can be considered the most representative gene expression profile of the module These plots may allow the user to understand the meaning of the module High values of the module eigengene suggest that the m
26. e 2 Issue 8 August e Horvath S Zhang B Carlson M Lu KV Zhu S Felciano RM Laurance MF Zhao W Shu Q Lee Y Scheck AC Liau LM Wu H Geschwind DH Febbo PG Kornblum HI Cloughesy TF Nelson SF Mischel PS 2006 Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target PNAS November 14 2006 vol 103 I no 46 17402 17407 e Langfelder P Zhang B Horvath S 2007 Defining clusters from a hierarchical cluster tree the Dynamic Tree Cut library for R Bioinformatics November btm563 e lLangfelder P Horvath S 2007 Eigengene networks for studying the relationships between co expression modules BMC Systems Biology BMC Syst Biol 2007 Nov 21 1 1 54 e Li A Horvath S 2006 Network Neighborhood Analysis with the multi node topological overlap measure Bioinformatics do1 10 1093 bioinformatics btl58 1 e Miller JA Oldham MC and Geschwind DH 2008 A Systems Level Analysis of Transcriptional Changes in Alzheimer s Disease and Normal Aging J Neurosci 28 1410 1420 e Oldham M Horvath S Geschwind D 2006 Conservation and Evolution of Gene Co expression Networks in Human and Chimpanzee Brains PNAS 2006 Nov 21 103 47 17973 8 e Yip A Horvath S 2007 Gene network interconnectedness and the generalized topological overlap measure BMC Bioinformatics 8 22 e Zhang B Horvath S 2005 A General Framework for Weighted Gene Co Expression Network Analysis Statistical Applications in
27. e automatic o an p a n an Ei ant to proceed with the auto matic or Automatic WGCNA Manual WGCNA The automatic analysis ImFeelingLazy uses default parameters Expression Data Filter for finding modules and significant hub genes For each gene and l m feeling lazy Hel Keep 000 most varing genes each module WGCNA outputs a module membership M value cee M values close to 1 or 1 suggest that the gene is a member in the Automatically detects respective module f modules and significant hub The manua WGCNA akp a allows the user to proceed in a step genes based on all genes Next gt wise fashion to define modules and significant genes In general we recommend the manual analysis but it has a limitation only 3600 genes can be included for the module definition Therefore we implement two approaches for restricting genes Expression Data Loaded Found 50 samples and 3000 genes i based on variance across samples Trait Data Loaded Found 1 available traits for analysis Geneinfo Data Loaded of those 3600 lt 3600 most connected genes keep for module detection ii based on whole network connectivity Since module genes tend to be highly connected little is lost by restricting the analysis to highly connected genes Figure To proceed with the automatic or the manual WGCNA analysis 1 the automatic analysis I m feeling lazy uses default parameters for finding modules and significant hub genes For each
28. e components Since modules may represent biological pathways there is no biological reason why modules should be orthogonal to each other Page 14 of 20 WGCNA User Manual With modules detected user can click Next gt button to move to gene selection step 5 Gene Selection File Tools Help Based on your module significance analysis from the previous step please choose a module of interest first You can use a heatmap button to plot the standardized expression values of the module genes rows across the arrays columns Red means over expression green means under expression Undemeah the heatmap plot we also plot the expression values y axis of the module eigengene across the samples x axis The samples columns of the heatmap plot line up with those of the eigengene po These plots my allow you to understand the meaning of the module High values of the module eigengene suggest that the module is up in the comesponding sample If the correlation structure of the module is due to asingle array the array may be an outlier Gene selection in principle all genes of a significant module are interesting However it can be useful to study the relationship between gene significance and intramodular connectivity K using the button Plot Gene significance versus intramodular connectivity The scaled version of the intramo dular connectivity K max K turns out to be highly related to the comesponding
29. e could select the 200 genes with highest absolute module membership values The selected genes could be used as input of a functional enrichment analysis software EASE KEGG Webgestalt Ingenuity etc For example we often use the software EASE David http david abec ncifcrf gov summary jsp Relating modules to each other and to a microarray sample trait WGCNA also outputs the module eigengenes in a separate file By correlating the module eigengenes one can determine how related co expressed the modules are to each other Module eigengenes form the nodes of an eigengene network Langfelder and Horvath 2007 which may reveal that modules are organized into meta modules clusters of co expressed modules The module eigengenes can also be used as covariates of a multivariate regression models that regresses the microarray sample trait y on the eigengenes Installation requirements 1 Windows operating system Win2000 NT WinXP or Vista with NET Framework installed NET Framework could be freely downloaded from Microsoft windows update 2 All necessary software is listed in the file WGCNA_Installation_Guide doc which is included in WGCNA package Please follow the installation steps Page 8 of 20 WGCNA User Manual 3 There s no hardware requirement for running WGCNA However considering the computation task of network construction we recommend computers with CPU frequency higher than 2 0 GHz and memory bigger t
30. e free topology criterion push the power Re Calcul selection button This will result in a graph of scale free topology Cluster Samples j ReLalculate fit R 2 y axis versus different power x axis Choose the Adjacency TOM smallest power for which R 2 gt 08 The button ClusterSamples can be used to assess whether arrays are outliers average linkage hierarchical cluster tree Euclidean Next gt distance If you find very distinct branches or outlying arrays the ow R 2 values In this case amiy choose a power e g 6 or consider removing scale free topology criterion may be meaningless efore proceeding Expression Data Loaded Found 50 samples and 3000 genes Once you have chosen a power the next button will automatically aerala as T avalable trake for analysis create a weighted co expression network and a corresponding cluster tree which will be used for module detection ea ale e eas The default network dissimilarity measure is based on the Teta a a topological overlap matrix TOM A less time consuming altemative is Calculating variance _ to use check box Use Adjacency instead of TOM Warners Salil ann fraser outlying arrays First the user needs to specify whethe a signed or an unsigned gene co expression network should be constructed Asigned network defines modules as clusters of positively correlated genes An unsigned network defines modules based on the absolute va
31. e measure as follows GeneSignificance 1 cor y Gene significance Module significance 1s determined as the average absolute gene significance measure for all genes in a given module This measure is highly related to the correlation between module eigengene and the outcome y Module significance Construction of weighted gene co expression networks and modules Genes with expression levels that are highly correlated are biologically interesting since they imply common regulatory mechanisms or participation in similar biological processes To construct a network from microarray gene expression data we begin by calculating the Pearson correlations for all pairs of genes in the network Because microarray data can be noisy and the number of samples is often small we weight the Pearson correlations by taking their absolute value and raising them to the power B This step effectively serves to emphasize strong correlations and punish weak correlations on an exponential scale These weighted correlations in turn represent the connection strengths between genes in the network By adding up these connection strengths for each gene we produce a single number called connectivity or k that describes how strongly that gene is connected to all other genes in the network We use the general framework of weighted gene co expression network analysis presented in Zhang and Horvath 2005 Horvath et al 2006 Briefly the absolute value of the Pear
32. ess time consuming alternative is to use check box Use Adjacency instead of TOM Because calculation of TOM or adjacency matrix is very time consuming WGCNA allows user to skip this step by un checking Re Calculate Adjacency TOM It s particularly useful when user switch back to Step 2 to pick another trait or gene significance column for analysis without re calculating the same gene network After choosing power user can click Next gt button to move to module detection step 4 Module Detection Fie Tools Help Gene Network by Dynanic Tree Cutting 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection 1 0 Module Detection Min Module Size 1 0 hiema laia List Modules Static Height Branch Cutting Static Height Cutoff 0 95 f Merge 2 bl Dynamic cut parameters H gt Dynamic Height Ms zo Branch cutting _ Max Height 0 to 1 0 99 erge 0 20 Deep Split Level Plot Module Choose 0 1 2 or 3 fo Significance a of o8 O89 Colored by modules Dynamic Hybrid M ax PAM Distance Branch Cutting ricesstn tata i 0 95 Plot Custer Tree TN Plot ME Pairs mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 4 2 eel IE mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 aa aal 1B mergeCloseModules Merging modules whose
33. han 2 GB Detailed description of the analysis steps 1 Load data File Tools Help H el p on Step 1 L oa d D ata 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection Load a comma or tab delimited file where rows are genes Expression Data wGCNA example_expression csv leci probe sets and columns correspond to microarray samples he first column should contain gene identifiers In order to screen for genes or modules that are biologically Sample Trait Data c WGCNA example_trait csv Load significant you need to define a gene significance measure optional with Genelnfo Toward this end WGCNA implements two options Option 1 based on a correlation with a microarray sample Gene Info Data c wGCNA example_geneinto csv Load trait T that corresponds to a column in the trait file optional with Trait Then GeneS ignificance i cor Gene i T The trait could be a binary outcome case control status or a quantitative outcome body weight Option 2 based on pre defined gene significance measure Next gt that corresponds to a column in the gene information file It must contain the same gene sets as In the expression data In order to proceed you need to load a trait file or a gene info file or both In the next step you can choose a column of the trait file as sample information trait or a column of the gene info file as pre defined gene significance measure
34. hip Thus a systems biologic gene screening method that combines gene significance and connectivity module membership measure amounts to a pathway based gene screening method Empirical evidence shows that the resulting systems biologic gene screening methods can lead to important biological insights Horvath et al 2006 Carlson et al 2006 Gargalovic et al 2006 Ghazalpour et al 2006 Fuzzy module annotation of the genes Apart from detecting co expression modules WGCNA also provide a comprehensive annotation of all genes on the array with regard to module membership For each gene the module membership table reports the module membership with regard to the identified modules Instead of forcing genes into distinct modules the fuzzy module assignment allows the user to identify genes that may be close to two or more modules These fuzzy module annotation tables form a resource for biomarker discovery The annotation tables can also be used to determine how close a given gene of interest 1s to the identified modules We report both the module membership measure correlation between the gene expression profile and the module eigengene and the corresponding correlation test p value Functional enrichment analysis of module genes It is natural to use the module membership measure to come up with lists of genes that comprise the module For example one could select blue module genes on the basis of MMBlue gt 0 6 or MMBlue lt 0 6 Alternatively on
35. inition of the network adjacency matrix we make use of the fact that gene expression networks like virtually all types of biological networks have been found to exhibit an approximate scale free topology Albert et al 2000 To choose a particular power P we used the scale free topology criterion described in Zhang and Horvath 2005 beta 6 scale free R42 0 88 slope 1 61 trunc R 2 0 98 0 5 1 0 log10 p k 2 0 0 8 1 0 1 2 1 4 1 6 1 8 log 10 k Figure Assessing the scale free topology of a weighted gene co expression network constructed using B 6 If the dots form an approximate straight line relationship then the network forms a scale free network The black curve corresponds to the regression line with model fitting index R 2 The red curve describes a truncated exponential fit see Zhang and Horvath 05 for more details Scale free topology criterion this technical section may be skipped at first reading The network exhibits a scale free topology if the frequency distribution p k of the connectivity follows a power law p k k Incidentally the power gamma has nothing to do with the soft threshold beta that is used to define the co expression network To visually inspect whether approximate scale free topology is satisfied one plots log p k versus log k A straight line is indicative of scale free topology To measure how well a network satisfies a scale free topology we use the square of the
36. lue of the correlation coefficient To construct a weighted network a power soft threshold larger than or equal to 1 should be specified Raising the co expression similarity to this power would result in a weighted co expression network The default power for an unsigned and a signed network is 6 and 12 respectively To choose a power the WGCNA also implements plots for the scale free topology criterion Zhang and Horvath 2005 This criterion is described in a separate section The power selection button results in a graph of scale free topology fit R 2 y axis versus different power x axis Choose the smallest power for which R 2 gt 0 8 or if a saturation curve results choose the power at the kind of the saturation curve The button ClusterSamples can be used to assess whether arrays are outliers average linkage hierarchical cluster tree Euclidean distance If the dendrogram has two or more very distinct branches or outlying arrays the scale free topology criterion may be meaningless low R 2 values In this case simply choose a power e g 6 or consider removing outlying arrays before proceeding Page 12 of 20 WGCNA User Manual Once the user have chosen a power the next button will automatically create a weighted co expression network and a corresponding cluster tree which will be used for module detection The default network dissimilarity measure is based on the topological overlap matrix TOM A l
37. nectivity measures how connected or coexpressed a given gene is with respect to the genes of a particular module The intramodular connectivity may be interpreted as a measure of module membership The module eigengene corresponds to the first principal component of a given module It can be considered the most representative gene expression in a module Example MEblue also denoted as PCblue denotes the module eigengene of the blue module When a microarray sample trait y is available e g case control status or body weight one can correlate the module eigengenes with this outcome The correlation coefficient is referrred to as eigengene significance The WGCNA software outputs the eigengene significance of each module eigengene and the corresponding correlation test p value For each gene we defined a fuzzy measure of module membership by correlating its gene expression profile with the module eigengene of a given module For example MMblue i cor x MEblue measures how correlated gene 1 is to the blue module eigengene MMBlue i measures the membership of the i gene with respect to the Blue module If MMBlue 1 is close to 0 then the im gene is not part of the Blue module But if MMBlue 1 is close to 1 or 1 it is highly connected to the Blue module genes The sign of module membership encodes whether the gene has a positive or a negative relationship with the Blue module eigengene WGCNA also outputs the corresponding correl
38. ng on their shape Langfelder Zhang and Horvath 2007 Compared to the constant height cutoff method dynamic branch cutting offers the following advantages 1 it is capable of identifying nested clusters 2 it is flexible branch shape parameters can be tuned to suit the application at hand 3 they are suitable for automation WGCNA implements two types of dynamic branch cutting method The first only considers the shape parameters The second method is hybrid method that combines the advantages of hierarchical clustering and partitioning around medoids Research aims that can be addressed with WGCNA Identification of co expression modules with high module significance Based on the gene significance measure we define two types of module significance measures Page 7 of 20 WGCNA User Manual The first type is simply the average gene significance of the module genes The second type of is referred to as eigengene significance which is only defined for a microarray sample trait y When a microarray sample trait y is available e g case control status or body weight one can correlate the module eigengenes with this outcome The correlation coefficient is referred to as eigengene significance WGCNA also outputs the eigengene significance of each module eigengene and the corresponding correlation test p value Identification of intramodular hub genes Intramodular connectivity can be interpreted as a fuzzy measure of module members
39. nterns tusqusesseomseeuelles 2 SBOE lossar y Ol MClWOlk CONCEDES are a E A iamone Niaselaak 3 Construction of weighted gene co expression networks and modules nessssssssoeerssssssssseerrsssssssees 4 Module STE iOi a E T 7 Research aims that can be addressed with WGCNA cccccccccccccsssseeseeceeeeeeaaeessseeceeeeeesaaaeeeeeeeeeeeaaas 7 Tisai atOms te QUIT emeni S aeriene a E a sandal stantenmeteiie eciaosnenrs 8 Detailed description OL theanalysis Steps secesii a Mela aerdaetiee Micah 9 ds AOAC AVA capa eeasbeitudeeniatee ucuteiaet E O E tat iuealeadiacanats 9 2 Data preprocessing and Im Feeling Lazy analysis cc cecccccccccccccssessseeecceececeaeeeseeeeceeeeeeaaaeeeees 10 DEANE LW ORK C ONS CLIO MN sagas caso aeca ates oleate E 12 Bhs MOME DELEC AON e arate tence strep E atta ue teeta ates 13 BOL 5 EE HK E ere mera ace ne OR eae mR Ce eS Renae ere RL cee RR eee eer ee 15 Gene sclechon ouput TICS ures cou iainnretGcweces ten tes owes ute denen ude eae arn aii desun team eee aes 17 SVS IMA E acca ciioetadeloa asc a Dron E E NE 18 TOW toges S he ID isetace tact stead isetaettehsies ncusnaeiaat tea naceiar eh ton saaeeusariah siendaaayiaiseh ten paauseateitannuesee 19 Network Tunction Hbrtary update cissussssssoecocesiantt ex ioecdddhadanvssuavedadesuaadbausioecdlaaaiackaaussedelasundbaxsieectiaaniantst 19 Recovery ION unexpected CEL OTS cc auewdaratasea dee estety sed a a 19 PRS E S AEE asia atte es eae E E A aetna
40. odule is up in the corresponding sample If the correlation structure of the module is due to a single array the array may be an outlier Gene selection File Tools Help 1 Load Data 2 PreProcess 3 Network Construction 4 Module Detection 5 Gene Selection bhie module cor 0 04 Manual Gene Selection Choose the Module Plot Gene Significance i s Intramodular Kk Choasa iha Genes Il Automatic Gene Selection turquoise GS gt o2 Auto Gene Selection Hel K Kmax gt 0 5 ee Select Genes Plot Module Heatmap Output Gene List a o C C a aul a c a 385 48 15 348 204 mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 2 zasl UE mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 2 l EL mergeCloseModules Merging modules whose distance is less than 0 1 colorgroup grey brown yellow turquoise blue 163 74 aal eael o EN It can be useful to study the relationship between gene significance and intramodular connectivity K using the button Plot Gene significance versus intramodular connectivity The scaled version of the intramodular connectivity K max K turns out to be highly related to the corresponding module membership measure K measures how centrally located the gene is inside the module File Tools Help 1 Load Data 2 PreProcess
41. son correlation coefficient is calculated for all pairwise comparisons of gene expression values across all microarray samples The Pearson correlation matrix is then transformed into an adjacency matrix A 1 e a matrix of connection strengths by using a power function Thus the connection strength adjacency a i j between gene expressions x i and x j 1s defined as a i j Icor x 1 x Qj IAB Optionally WGCNA can also be used to construct a signed network which keeps track of the sign of the correlation coefficient a i j 0 5 0 5 cor x 1 x QJ B Page 4 of 20 WGCNA User Manual Because microarray data can be noisy and the number of samples is often small we weight the Pearson correlations by taking their absolute value and raising them to the power B The resulting weighted network represents an improvement over unweighted networks based on dichotomizing the correlation matrix because 1 the continuous nature of the gene coexpression information is preserved and 11 the results of weighted network analyses are highly robust with respect to the choice of the parameter B whereas unweighted networks display sensitivity to the choice of the cutoff The network connectivity k i of the ith gene expression profile x 1 is the sum of the connection strengths with all other genes in the network 1 e it represents a measure of how correlated the i th gene is with all the other genes in the network To determine the power used in the def
42. y how the gene significance measure should be defined Page 10 of 20 WGCNA User Manual For example a microarray sample trait e g a numeric outcome can be chosen from the sample trait file If there is more than one trait in the trait data user needs to choose a certain trait to be used for gene significance calculation As an alternative user can also specify a pre defined gene significance column in gene info data WGCNA automatically detects whether a column is numeric denoted by N or character denoted by C Please make sure to choose a numeric column as trait or gene significance values Alternatively the user can choose a pre specified gene significance measure from the gene information file For example the pre specified gene significance measure indicates pathway membership knock out essentiality or the T test statistic from a prior study After specifying the gene significance measure the user can either choose the manual WGCNA analysis which allows the user to identify modules and select intramodular hub genes or the user can choose and automatic WGCNA analysis by pushing the I m feeling lazy button The automatic analysis will automatically choose modules and rank the entire genes according to network screening results The automatic analysis has one major advantage it can deal with tens of thousands of genes However the user does not get to see a cluster tree module heatmaps etc Although we find

WGCNA user manual

Contents

Download Pdf Manuals

Related Search

Related Contents