Home

PathWave v2.1 – User Manual

1. Lists for each pathway the reactions genes involved in the most significant pathway feature s pattern s from Haar Wavelet transfroms that gave rise to the pathway s p value hsa00010 1 hsa00010 RO01662 hsa00010 R01070 hsa00010 R01788 hsa00010 R04780 5 hsa00010 RO04779 hsa00010 R01061 hsa00010 RO02740 hsa00010 R01015 hsa00561 Paso pwresSresults filtered call Lists the parameters used for multiple testing and filtering of the results pw result x pw pvalCutoff 0 05 genes NULL filter TRUE filter size 3 multtest Bonferroni verbose TRUE pwresSresults filteredSmulttest Method used for multple testing correction Bonferroni pwresSresults filteredSpvalCutoff P value cutoff after multiple testing used for reporting significant results CROs pwresSresults filtered filter Where the significant results filtered see Section 4 2 1 This is always true To deactivate the filtering please set the filter size to zero TRUE pwresSresults filtered filter size Minium number of genes reactions used for filtering see Section 4 2 1 3 pwresSresults filtered version Version of PathWave specified as a date 2012 04 26 pwresSresults filteredSurl Source location of the pathway information used KEGG hsa 2011 04 14 For own preprocessed pathway information this corresponds to the pathway directory specified as input file pathwaydir see Section 3 2
2. 5 1 3 Complete results for all pathways More detailed results for all pathways whether significant or not are available in the returned pwres results all object optionally written to the lt my_out_prefix gt pw results rda file see above This object itself is a list composed of the several elements as shown in the following examples pwresSresults allSr reg direction Sample class to which up down regulations are referred WiC air reccec This means that an up regulated reaction has a higher expression in the EC_affected sample class the other sample class is the control pwresSresults allSp value P values without multiple testing correction for all pathways hsa05414 hsa04150 hsa04670 hsa04742 hsa00010 hsa04010 6 641146 12 3 376588e 11 1 440104S 10 9 109925e 10 8 152520E 08 9 835255e 08 hsa00561 hsa00100 hsa00532 hsa00360 hsa04910 hsa00830 2 000542607 2 0906026 07 Z 943297S 07 4 4545056 07 G6 17 4754S 07 6 375999E 07 eee pwresSresults allSfeat p value P values without multiple testing correction for all Wavelet based features computed for the pathways see Schramm et al 2010 for what is meant by features hsa00010 M1_row_LH_1_10 hsa00010 M1_org_LH_1_11 hsa01100 M54_org_HH_1_6 2 164481e 10 3 432276e 10 4 691022e 10 hsa01100 M54_org_HL_1_6 hsa01100 M54_org_LH_1_6 hsa01100 M54 org _LL_1 6 4 691022e 10 4 691022e 10 4 691022e 10 eel pwresSresults all feat reaction_list
3. Features that have been computed from the pathway network using the Haar Wavelet transform This vector maps the feature names internally used by PathWave to the lists of reaction gene nodes separated by pipe symbols that are involved in these features hsa00010 M1_org_LL_1_3 hsa00010 R00431 hsa00010 R00658 hsa00010 R01518 hsa00010 M1_org_LL_1_4 hsa00010 R00014 hsa00010 R00200 hsa00010 R00726 Hs adoOOLOeMimorgmlinelaeS hsa00010 R00703 pwresSresults allSr reg Direction of regulation up 1 down 1 for all reactions and signaling proteins genes of all pathways with respect to the reference sample class see above Note These results are not yet filtered for significance i e also reactions with high p value are considered as either up or down here hsa00010 R00014 hsa00010 R00200 hsa00010 R00235 hsa00010 R00431 hsa00010 R00658 i i il i sa A pwres results all r p value P values for a differential regulation of all reactions and pathways hsa00010 R00014 hsa00010 R00200 hsa00010 R00235 hsa00010 R00431 hsa00010 R00658 4 973641e 01 3 504558e 06 4 201003e 01 2 740424e 03 1 755629e 04 el pwresSresults all data Summary information about the expression data on which PathWave was run Note this is after mapping and grouping of expression profiles to metabolic reactions or signaling molecules hence the number of rows is the number of network nodes metabolic reactions signaling molecules to which th
4. ROA Vey MorsveilO OOO SIRNA TS MavsserO O 0 JO ROSA Hal VSO OOlOsROZ7SO MaseNOOLO ORI VOSSY VinsaOO lO ROL O0 MovsrsiOVO O AL 0 RASLE 17 hsa00010 RO01786 hsa00010 R09085 hsa00010 RO1602 hsa00010 R00658 21 hsa00010 RO0200 hsa00010 R00746 hsa00010 RO0754 hsa00010 RO00014 25 VaASaOOOLG SOO 7OSY MasaoOQOlO s R02 OU VWMoreveOOO ILO SieNOAS eS Mirsa NOOO 2 1840 7 GAL Shy 29 hsa00010 RO00431 hsa00010 R00726 hsa00010 RO0235 hsa00010 R00710 33 Hsa000t0 ROOL I S hsa00010Sreaction p value hsa00010 R01662 hsa00010 R01512 hsa00010 R01788 hsa00010 R00959 hsa00010 RO01516 TOS ineo 4 012831e 04 2 982984e 05 1 001484e 05 OBS 7e Oo hsa00010 R01061 hsa00010 R02740 hsa00010 R01015 hsa00010 R01070 hsa00010 R04780 1 999681le 05 hsa00010 R04779 3 167905e 06 hsa00010 R01518 1 516934e 01 hsa00010 R00200 3 504558e 06 hsa00010 R03270 4 97364le 01 hsa00010 R00235 4 201003e 01 3 409720e 06 hsa00010 R03321 3 409720e 06 hsa00010 R01786 IMSO CGE ON hsa00010 R00746 2 584705e 05 hsa00010 R02569 5 436827e 01 hsa00010 R00710 3 269826011 4 486522e 06 hsa00010 R02739 3 409720e 06 hsa00010 R09085 3 657 55 76 02 hsa00010 R00754 1 611098e 03 hsa00010 R07618 1 65035 9e 01 hsa00010 RO0711 5 LYZSaile Oil 3 B90238E 0 7 hsa00010 R09086 35 657 5571SE 02 hsa00010 R01602 9 868684e 01 hsa00010 R00014 4 97364le 01 hsa00010 R00431 2 740424e 03 LL SILZ68e 01 hsa00010 R01600 1
5. file pathwaydir lt dir_with_KGML_files gt input file pathwayid2name lt file_mapping_pathway_ID_to_name gt output file matrixdir lt dir_where_to_store_adjacency_matrices gt Alternatively if parameters are specified in a configuration file the function can be run with pathWave preprocessPathways configfile lt my_conf_file gt The use of input file pathwayid2name is optional Each produced adjacency matrix will be written with filename lt pathway_ID gt matrix in the specified matrix directory Additionally an R data file named pwdata pathways lt my_ID_tag gt rda will be written to the current working directory This file contains the adjacency matrices and optional mappings from pathway IDs to names that are used for running PathWave on expression data see Section 4 Important notes e The chosen ID tag lt my_ID_tag gt should be unique The use of tags associated with preprocessed pathway data already provided with the package see Table 4 1 should be avoided e For pathways from KEGG lt my_ID_tag gt should start with KEGG this allows to later run PathWave specifically on metabolic KEGG pathways excluding signaling etc by running the procedure with param kegg only_metabolism TRUE see Section 4 2 1 See the online help for further information pathWave preprocessPathways 3 3 Computing optimized 2D grid arrangements This is the most critical step in the preprocessing of
6. if not write to the Free Software Foundation Inc 51 Franklin Street Fifth Floor Boston MA 02110 1301 USA or see http www gnu org licenses old licenses gpl 2 0 html Publications Schramm et al 2010 PathWave discovering patterns of differentially regulated enzymes in metabolic pathways Bioinformatics 26 9 1225 1231 Piro et al 2014 Network topology based detection of differential gene regulation and regulatory switches in cell metabolism and signaling submitted Manual authors Rosario M Piro Stefan Wiesberg DKFZ Heidelberg Germany University of Heidelberg Germany Changes since PathWave v1 0 as published in Schramm et al 2010 1 Improved user interface 2 faster algorithm for the identification of significantly dysregulated pathways 3 preprocessed pathways for multiple species 5 adaptation to the KEGG XML format KGML v0 7 1 6 easy to use interface for building custom URLs to get colored pathway maps from the KEGG website 7 minor bug fixes 1 Installation 1 1 Required software For installing and running PathWave on preprocessed pathway data sets see Section 4 1 for a list the following software must be installed a R version gt 2 14 0 available from http www r project org b the CRAN R packages XML e1071 gtools evd On Unix Linux systems these packages can be installed from the R command line for example install packages XML On Windows systems they can al
7. optimally arranged 2D grid representations of the pathway networks that are used for running PathWave on expression data see Section 4 Important notes e The identification tag lt my_ID_tag gt must be the same as used in Section 3 2 e The pathway directory input file pathwaydir must be the same as used in Section 3 2 e The directory of optimal grid arrangements input file optgriddir is the directory where the output files produced by Grid Arranger are stored see Section 3 3 See the online help for further information pathWave preprocessOptGrids 4 Running PathWave This section explains how to run the PathWave algorithm with already preprocessed metabolic signaling pathways that have been transformed in 2D grid representations see Section 3 on gene expression data for the purpose of identifying pathways whose regulation is significantly different between two sample sets e g normal and tumor tissue In contrast to other methods PathWave takes the topology of metabolic networks into account mapping them on optimally arranged 2D grids and can identify interesting pathways also if localized subnetworks show significant differences e g metabolic switches For more details see Schramm et al 2010 and Piro et al 2014 4 1 Preprocessed pathways provided with the package Version 2 1 of PathWave comes with several preprocessed pathway data sets such that for many applications the steps described in Section 3 ca
8. pathways e g signaling DNA repair etc will be ignored However the parameter is only used if PathWave is run with preprocessed pathways from KEGG recognized by having an lt ID_tag gt starting with KEGG see preprocessed tag above e param ztransform FALSE TRUE specifies whether expression data should be z transformed after it has been mapped to metabolic reactions via the involved enzymes The default is TRUE for PathWave but you may want to specify FALSE in case your expression data is already z transformed e param numperm lt num_permutations gt the number of randomizations permutations to perform for testing the statistical significance of the differential expression of metabolic reactions or signaling genes Default 1000 Note The memory requirements but also the accuracy of P values increase with the number of permutations The default has been tested with a common PC with about 8 GB of RAM e param pvalue correction method Bonferroni BH the correction method for multiple testing For available methods see the online help of p adjust p adjust Default Bonferroni for multiple testing correction according to the Bonferroni method e param pvalue threshold lt p_value_cutoff gt the p value cutoff for reporting interesting pathways Default 0 05 e param filter size lt num_genes_and_reactions gt an additional filter that removes all pathways for which less than lt num_
9. 711650 B D Glucose 1135 D Glyveraldehyde ae 119910 1 1 99 3 111215 D Ghucose 1 1 5 2 Dehi b v gluconate Pyruvate gt Glycolysis 27113 2 7 1 12 Enter Doudoroff 2 7 1 45 K 2 Dehy Del D an aat 6P 1 1 1 43 m PEETFI P D Gtucose 6P 412 14 i D ERE O BEI 42112 r O 7 D Glucono 6 Phos 2 Dehydro 3 de 12 D Glyveraldehyde 3P 1 5 lactone 6P D a D Baan Ge R 7 a tf o D Glucose 6P 0 5 3 1 27 O4 412 43 D arabino Hex 3 ulose 6P tol Pentose Bentose and guruoo Reo and guruoo B D Fructose sE D agbs SP 0 moa O D Ribuose 5P Glycolysis D Ribose 5P D Fructose 1 6P 4 Jus O D Ribose B D Fructose 1 6P2 27 tll o z761 5422 PRPP O _ 7 4 23 D Ribose 1 5P D Xyhulose 5P af Purine EBen 2TASI ie 2 Deoxy D rbose o Expuune O 5427 O D D s Histidine 2 Deoxy D ribose 5P 2 Deoxy D ribose 1P gt Esie 00030 5 31 12 c Kanehisa Laboratories Acknowledgements We thank Zita Soons Maastricht University and Ashwini Kumar Sharma DKFZ for proof reading and testing and the DKFZ Data Management team for the continuous support References e Duarte et al 2007 Global reconstruction of the human metabolic network based on genomic and bibliomic data Proc Nat Acad Sci USA 104 6 1777 1782 Elf et al 2001 Branch and Cut Algorithms for Combinatorial Optimization and Thei
10. H param pvalue threshold lt p_value_cutoff gt param filter size lt num_genes_and_reactions gt output file prefix lt my_out_prefix gt verbose FALSE TRUE Alternatively if parameters are specified in a configuration file the function can be run with pwres lt pathWave run configfile lt my_conf_file gt Note for the remainder of the manual we assume the function s return value to be stored in an object named pwres The same object name is used when saving results via output file prefix see below The function call returns a list object composed of three elements pwres results all results for all pathways whether significant or not pwresSresults filtered only filtered significant pathways pwres results table human readable table with a summary of filtered significant results The following Sections describe mandatory and optional parameters in more detail 4 2 1 Mandatory parameters to pathwave run The following parameters MUST be specified either with the function call or in a configuration file e preprocessed tag lt ID_tag gt identifies which preprocessed pathway information has to be used to map the expression data to metabolic networks This can be either one of the pathway data sets provided with the package see Table 4 1 or a custom pathway data set produced as described in Section 3 In the latter case it is imperative to load the respe
11. PathWave maps the expression data to optimally arranged 2D grid representations of metabolic and signaling pathway networks The preprocessing of pathway information to produce the 2D grid representations is only partly done with the PathWave R package itself although the necessary external source code is provided along with it see Section 3 3 3 1 Translating SBML files to KGML like pathway files PathWave requires pathway descriptions from XML files that respect the Kyoto Encyclopedia of Genes and Genomes KEGG Markup Language KGML although not all features of KGML are used For users that instead wish to extract information on metabolic pathways from SBML files we provide an additional Perl script that translates an SBML file into a set of KGML like XML files for use with PathWave Important Please note that this has been tested only with the metabolic model of the Human recon 1 Duarte et al 2007 taken from the BiGG database Schellenberger et al 2010 see Section 4 1 Since the SBML standard seems to be applied in a rather arbitrary fashion there is no guarantee that the Perl script will also work with other SBML files because it was developed for use with Recon1 BiGG To obtain the script unpack the PathWave package file tar gz or zip and take it from the subfolder PathWave src Filename rmp extractReactionsFromBiGGreconsSBML pl Dependencies The script depends on the Perl module XML TreeBuilder freely av
12. PathWave v2 1 User Manual User Manual version 1 4 April 4 2014 This manual describes the functions provided by the PathWave R package version 2 1 and their usage PathWave uses gene expression data to identify metabolic and signaling pathways whose regulation is significantly different between two sample sets e g normal vs tumor tissue In contrast to other pathway analysis methods PathWave takes the topology of the pathway networks into account mapping them on optimally arranged 2D grids and can identify interesting pathways also if localized subnetworks show significant differences e g metabolic switches For more details on the method please see Schramm et al 2010 For the novelties of version 2 1 please see Piro et al 2014 PathWave authors Gunnar Schramm Rosario M Piro Stefan Wiesberg Availability http www ichip de software pathwave html License PathWave v2 1 is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program
13. S RO1070 RO1061 RO1662 ROM Sela RO1516 ROS Z 70 RO2569 ROO S ou ROO710 4 hsa0001 hsa0001 hsa0001 hsa0001 hsa0001 wo wo hsa0001 wo hsa0001 hsa0001 hsa0001 Sie eS 2 2 iS RO4780 SRO A IW R02740 gt RO1788 OO gt RO7618 SIRO FLL R00746 gt ROO754 An overall summary of the procedure can be obtained by passing the complete results for all pathways pwres results al1 or the filtered significant results pwresSresults filtered to the function pathWave resultSummary Example pathWave resultSummary pwresSresults all pw pathWave Pathways Source url KEGG hsa 2011 04 14 Version of 2012 04 26 Samples Classes Overlap between expression and pathway reactions Expression data 23 EC_affected Permutations Number of features generated Number of pathways with p value lt 0 01 Significance of regul 1000 pathway p val hsa05414 6 7e 12 hsa04150 3 4e 11 hsa04670 1 4e 10 hsa04742 9 1e 10 hsa00010 8 2e 08 9095 reactions Cae Omietqoulk lation patterns 202 pathways with 8260 reactions 25470 lue uncorrected stsLiesic oe SIZ 100 and p value lt 0 05 100 Note if applied to the complete results pwres results all the reported p values are NOT corrected for multiple testing For corrected p values see Sections 5 1 1 and 5 1 2 See the online help for further i
14. Sections 5 1 1 5 2 and 5 3 will be sufficient 5 1 Returned results 5 1 1 Human readable summary of significant pathways The most important summary is the human readable table pwres results table returned by pathWave run optionally written to the file lt my_out_prefix gt pw results_table tsv see above The table reports the significant and filtered pathways the number of up and down regulated reactions and the sample class to which the regulation is referred the other sample class is the control Table 5 1 shows an example in which 18 metabolic reactions of the glycolysis gluconeogenesis pathway are down regulated in the sample class EC_affected with respect to the other sample class while only one reaction is up regulated in EC_affected 14 reactions show no important changes Note For signaling pathways the same terminology is used reactions up reactions down etc although the up down regulation is actually regards signaling proteins genes and not metabolic rections Table 5 1 Example of a human readable output of PathWave pathway name p value reactions up reactions nochange reactions down regulation direction hsa00010 Glycolysis 1 20e 05 1 14 18 EC_affected Gluconeogenesis hsa00561 Glycerolipid metabolism 2 96e 05 1 10 4 EC_affected hsa00100 Steroid biosynthesis 3 09e 05 2 7 25 EC_affected hsa00532 Glycosaminoglycan 4 36e 05 1 2 4 EC_affected b
15. ailable at www cpan org Run the script on your SBML file like in the following example Note the description regards a Unix Linux system the procedure may differ for Windows rmp extractReactionsFromBiGGreconsSBML pl f lt your_sbml_file gt E lt file_with_metabolites_to_exclude gt 5 P lt kgml_file_prefix gt O lt sepcies_tag gt The most important command line arguments are e f lt file gt The input SBML file mandatory e E lt file gt An additional file listing metabolites to be excluded one per line this allows to ignore metabolites and chemical substances that do not provide meaningful in formation for inferring links between metabolic reactions e g H20 Metabolites to be ignored are specified in terms of the corresponding species ID reactant product used in the SBML file example M_h2o0_r e 5 The reactions gene associations will be simplified The BiGG recon 1 SBML file uses modified Entrez gene IDs that are followed by a dot and a digit to indicate different isoforms e g 1312 1 and 1312 2 Since KEGG uses plain Entrez gene IDs and expression data is mostly mapped to genes rather than their single isoforms this option allows to remove the information related to the isoform and concentrate on plain Entrez gene IDs e g 1312 e P lt prefix gt If specified the subsystems found in the SBML file will be interpreted as pathways and written to a set of KGML like XML fil
16. ay map can be found in its lower left corner see Fig 5 1 2 Attention signaling pathways from KEGG can also be drawn but should be verified thoroughly because they do not involve metabolic reactions with well defined reaction IDs Therefore PathWave communicates the KEGG web server which gene IDs should be colored If such a gene is involved in multiple protein complexes i e network nodes all of them will be colored 3 The pathway that includes all single metabolic pathways having ID 01100 e g hsa01100 for human name Metabolic pathways will be ignored because drawing it with all up and down regulations is a too complex task Figure 5 1 shows an example of a downloaded color coded pathway map obtained using the PathWave procedures and interfaces To compare this pathway map to its original KEGG version please check http www genome jp kegg bin show_pathway hsa00670 See the online help for further information pathWave getColorKEGGMapURLs Figure 5 1 Example of a color coded KEGG pathway map PENTOSE PHOSPHATE PATHWAY i Aminosugars D Glucisaminate metabolism jJ 7 ag 43 19 Ponos and AE 2 Dehytro 7 Siemens f 1134 P Gleis D Gluconate 3 Bea D gluconate a Glyverate Glyverate 2P P Oo Oo 31 1A7 O 42 1 39 O 41 2 OF 1 2 7 5 mO 2
17. ctive R data files written by pathWave preprocessPathways and pathWave preprocessOptGrids pwdata pathways lt ID_tag gt rda and pwdata optgrids lt ID_tag gt rda see Sections 3 2 and 3 4 before using them with pathWave run e input exprdata lt my_expr_data gt the gene expression data set on which to run PathWave This can be passed to pathWave run as e a data frame containing a matrix row names must be Entrez gene IDs column names are sample IDs or e a file name from which to load the expression data Required file format Tab separated vector TSV the header line must contain only sample IDs and data lines must have an additional preceding field containing the Entrez gene ID hence data lines contain one more field than the header line e input sampleclasses lt my_sample_classes gt definition of exactly two samples classes for the expression data This may be one of the following three e a factor matching exactly the columns in the expression data e adata frame containing a table with two columns 1 sample ID 2 class or e the name of a file containing a mapping from sample ID to class as TSV File format no header one mapping per line as lt sample_ID gt lt tab gt lt class gt Note If the two classes sample subsets are specified as a data frame or as a file name we have a precise sample ID and can therefore also specify only a subset in an arbitrary order of the full expression data i e the f
18. d SIS Gos hsa00010 R00658 1 DSO298 04 hsa00010 R00703 L SYSIZ6e 02 hsa00010 R00726 2 740424e 03 S hsa00010Sreaction regulation hsa00010 R01662 hsa00010 R01512 hsa00010 R01788 hsa00010 RO00959 hsa00010 R01516 0 1 0 hsa00010 RO01061 hsa00010 R02740 hsa00010 R01015 hsa00010 RO01070 hsa00010 R04780 1 1 i i 0 hsa00010 R04779 hsa00010 R03321 hsa00010 R02739 hsa00010 RO9086 hsa00010 R01600 1 1 1 1 0 hsa00010 R01518 hsa00010 R01786 hsa00010 RO09085 hsa00010 R01602 hsa00010 R00658 0 0 i 0 1 hsa00010 RO00200 hsa00010 R00746 hsa00010 R00754 hsa00010 RO00014 hsa00010 R00703 1 A 0 i hsa00010 R03270 hsa00010 R02569 hsa00010 R07618 hsa00010 R00431 hsa00010 R00726 0 0 0 i i hsa00010 R00235 hsa00010 R00710 hsa00010 RO00711 0 0 0 For each significant pathway the reaction IDs pwres reaction are listed along with the information pwres reaction regulation whether the reactions are up regulated 1 down regulated 1 or not differentially regulated 0 and along with the respective p values pwresSreaction p value Results for single pathways can be accessed directly through for example pwresSresults filtered pathway hsa00010 or more specifically thorugh pwresSresults filteredSpathwayShsa00010Sreaction pwresS results filtered pathway hsa00010 reaction regulation and pwresSresults filteredSpathwayShsa00010Sreaction p value pwresSresults filteredSmost sign pattern
19. e and as an independent package at http www ichip de software pathwave html To extract it from the R package unpack the R package and take the GridArranger package file GridArranger v1 0 tgz from the subfolder PathWave src Alternatively download the GridArranger package from the PathWave website To install the Grid Arranger unpack GridArranger v1 0 tgz and open the file Makefile in its main directory Adjust the paths to ABACUS and CPLEX at the top of the file then enter your preferred GNU compiler You should use the same compiler that you used to compile ABACUS On a Unix Linux system execute the following command from a shell make 3 3 3 Running Grid Arranger on pathway adjacency matrices Copy the adjacency matrix files produced by pathWave preprocessPathways see Section 3 2 from output file matrixdir to the folder in which is located in the main directory of the GridArranger To start the computation on a Unix Linux system execute the following commands from a shell cd lt main directory of GridArranger gt chmod ut x runGridArranger cunGridArranger The Grid Arranger will arrange all files found in the input folder in and store the results in the output folder out It will print status messages to your shell After the calculation is finished the file statistics log is being created in the output folder out It contains some information about the success of the run as well as possible error m
20. e expression data has been mapped not the number of genes for which expression data was provided So SxSrow 1 29093 Sxcicol i 23 SxSoverlap 1 3152 Sy 1 WEG _euriceccecl Wane coimecmoilY pwresSresults all call Lists the parameters used for processing the expression data pw pathWave x x y y optimalM optimalGrid nperm 1000 verbose TRUE pwresSresults allSnperm Number of randomizations permutations performed for statistical evaluation 1000 pwresSresults allSoM This is a complex list object containing all necessary data for the preprocessed pathways It is composed of pwresSresul pwresSresul pwresSresul ts all ts al ts a SoMSversion pwresSresults all 1SoMSpathways pwresSresults al 11SoMSdata lSoMSurl llSoM reactions and The latter contains the adjacency matrices of the pathways in terms of reactions e g pwresSresults all o M1 1 wou wou wo wou wou wo wou wou wo wo ean Mr eo eo jo ob WN ss sos SSS NS SS PRR 5 2 Overall summary Sdata hsa00010 2 won won wou o hsa00010 hsa00010 RO00431 hsa00010 hsa00010 RO00726 hsa00010 hsa00010 hsa00010 won won o SROMSISE RO0658 gt ROO200 R00014 RO0703 3 hsa00010 hsa00010 hsa00010 hsa00010 hsa00010 hsa00010 wo hsa00010 hsa00010 hsa00010 hsa00010 wo ROMO E
21. es one per subsystem named lt prefix gt lt subsystem_name gt xml This is argument not mandatory for using the script that has also another purpose but required for PathWave e O lt tag gt A KEGG like species tag Default hsa Note The script will write summary information of the found metabolic reactions to the standard output This information is not required for PathWave and can be ignored or redirected The KEGG like pathway files written using option P can be used for the preprocessing steps described in the following Sections See the online help on the shell command line for further information rmp extractReactionsFromBiGGreconsSBML pl h 3 2 Producing adjacency matrices The first step when preprocessing pathway information is to produce adjacency matrices from the pathway descriptions PathWave takes pathway descriptions from XML files that respect the Kyoto Encyclopedia of Genes and Genomes KEGG Markup Language KGML KGML versions 0 7 0 and 0 7 1 have been tested Procedure 1 Save the KGML files with filenames lt pathway_ID gt xml in a directory Each lt pathway_ID gt is an identifier for a pathway e g hsa00280 alternatively pathway names may be used instead of their IDs No other files with the extension xml should be in this directory 2 Run the following PathWave function in R pathWave preprocessPathways preprocessed tag lt my_ID_tag gt input
22. essages In the unlikely case that for one of the input files no approximate solution could be found e g if the adjacency matrix is very large you can try the following steps 1 Rerun the solver It contains several random elements such that the results of different runs might differ from each other 2 Open the file abacus in the folder bin and increase the parameter MaxCpuTime By default it is set to 30 minutes for every input file After a successful arrangement of the adjacency matrix into compact 2D lattice grids the Grid Arranger output files can be used for the final preprocessing step that is again executed using the R package as described in the next Section 3 4 Preparation of 2D grid arrangements for PathWave The third and last step of preprocessing is the preparation of the externally computed 2D grid arrangements see Section 3 3 for the use with PathWave This is done in R with the following function call pathWave preprocessOptGrids preprocessed tag lt my_ID_tag gt input file pathwaydir lt dir_with_KGML_files gt input file optgriddir lt dir_where_2D_grid_arrangements_are_stored gt Alternatively if parameters are specified in a configuration file the function can be run with pathWave preprocessOptGrids configfile lt my_conf_file gt Result An R data file named pwdata optgrids lt my_ID_tag gt rda will be written to the current working directory This file contains the
23. genes_and_reactions gt metabolic reactions involving less than lt num_genes_and_reactions gt genes are differentially expressed This allows to filter out cases in which for example a single enzyme is down regulated but used several times in the metabolic network of a pathway i e involved in several network nodes The default minimum number of reactions and genes is 3 e output file prefix lt my_out_prefix gt if specified the three components of the returned list object will be saved in three files e lt my_out_prefix gt pw results rda complete results in the PathWave object pwres composed of O pwres results all results for all pathways no filtering and correction for multiple testing applied yet Oo pwresSresults filtered only filtered significant pathways O pwres results table human readable summary for filtered significant pathways e lt my_out_prefix gt pw results_table tsv summary of filtered significant results as a human readable TSV table corresponding to pwres results table e verbose FALSE TRUE specifies whether to print verbose information about the running procedure to the screen Default TRUE See the online help for further information pathWave run 5 Analyzing the results The following are a few hints on what results to expect from PathWave and how they can be mined and analyzed Note for most applications you will not need very detailed results and the hints given in
24. he color gray will be used for other reactions that have been evaluated by PathWave i e whose involved genes had available expression data but showed no significant changes Reactions enzymes for which no data was available will have the original color used by KEGG pathway maps The color code to be used for drawing PathWave results can be personalized using the function argument col as in the example above Colors are specified as a string vector in the following order 1 color for down regulation 2 color for no changes 3 color for up regulation Default col c green grey red Additionally the font color of some nodes is changed to indicate which reactions genes are involved in the most significant pathway feature s pattern s that gave rise to the pathway s p value Colors are specified as a string vector in the following order 1 font on down regulated nodes 2 font on nodes without significant differences 3 font on up regulated nodes Default col sign pattern c red red green Colors for both nodes and font can also be arbitrarily chosen and specified as hexadecimal RGB code e g FF0000 for red Important 1 Keep in mind that the preprocessed pathway information used with PathWave may be older than the pathway map currently available from the KEGG web server hence the map may have been changed since pathway information was downloaded preprocessed The date of last modification of the obtained pathw
25. iosynthesis chondroitin sulfate hsa00360 Phenylalanine metabolism 6 59e 05 1 6 3 EC_affected 5 1 2 Complete results for significant pathways More detailed results for significant filtered pathways are available in the returned pwres results filtered object optionally written to the lt my_out_prefix gt pw results rda file along with the complete results see above This object itself is a list composed of several elements as shown in the following examples e pwresSresults filteredSr reg direction Sample class to which up down regulations are referred EC_affected This means that an up regulated reaction has a higher expression in the EC_affected sample class the other sample class is the control e pwresSresults filteredSp values P values of significantly dysregulated pathways hsa00010 hsa00561 hsa00100 hsa00532 hsa00360 hsa00830 1 2065736 05 2960802605 3 0940916 05 4 3654646 05 6 593112S 05 9 4364 796e 05 hsa00980 hsa00480 hsa00670 hsa00052 hsa00562 hsa00590 Peo AVS SOARS AiG 7 oOe SOA 4 069295 04 Non 4 Sie Ge 0 4G oslAG eS 04 OU SoVe 04 hal oD e pwresSresults filteredSpathway Complete details for dysregulated pathways hsa00010 reaction 1 YESAOOOIOSOLGSSC2 MinseiOOLO ROL Sle VMinsaOOWiLO cL WSs MaseOOVO 10 sO Sse DS VASaQOOIOGsROLSLSEY VinsaonQowde RO LOGY Masa OO iO sine 7A MarsyerOOi0 I 0 ROLOS LS Yisct00lo ROLO HOY Vase
26. n be skipped Table 4 1 lists all available pathway datasets and their associated ID tags preprocessed tag to be specified when running PathWave Table 4 1 Available preprocessed metabolic pathway data ID tag Organism Source Gene IDs BiGG hsa H sapiens Recon 1 Duarte et al 2007 BiGG Entrez gene IDs e g 10000 database Schellenberger et al 2010 KEGG hsa_ H sapiens KEGG Kanehisa et al 2012 Entrez gene IDs e g 10000 downloaded April 14 2011 KEGG mmu M musculus KEGG Kanehisa et al 2012 Entrez gene IDs e g 11674 downloaded April 14 2011 KEGG dme D melanogaster KEGG Kanehisa et al 2012 Gene symbols as used by KEGG e g downloaded April 14 2011 Dmel_CG11876 for CG11876 Note 1 KEGG dre D rerio KEGG Kanehisa et al 2012 Entrez gene IDs e g 321664 downloaded April 14 2011 KEGG cel C elegans KEGG Kanehisa et al 2012 Locus tags e g LLC1 3 Note 2 downloaded April 14 2011 KEGG eco E coli KEGG Kanehisa et al 2012 MG1655 Gene IDs ordered locus names downloaded April 14 2011 e g b2097 Important there is some inconsistency in the gene IDs used by KEGG but we have opted for taking the IDs exactly as used by the database from which we derive the pathway information Notes 1 This holds for gt 99 of all genes mostly those having a CG ID as symbol For the remainder KEGG uses only the symb
27. nformation pathWave resultSummary 5 3 Obtaining colored pathway maps from KEGG If the preprocessed pathway information has been derived from KEGG pathways Kanehisa et al 2012 the differentially regulated pathways obtained from the PathWave analysis can be drawn as metabolic networks with color coded reactions enzymes according to their regulatory status This service is provided externally by the KEGG website and not by PathWave itself PathWave however provides an easy to use function for building URLs that can be passed to any web browser for querying the KEGG web server to retrieve custom pathway images with colored nodes reactions or genes keggurls lt pathWave getColorKEGGMapURLs pwresSresults filtered preprocessed tag KEGG hsa col c green grey red col sign pattern c red red green This function call returns a vector containing URLs with which the colored networks can be requested from the KEGG web server Example keggurls 1 http www kegg jp kegg bin show_pathway hsa04140 64422 09red green 10533 09red black 9140 09grey black 11337 09grey black 30849 09grey black 5289 S09green red 2 http www kegg jp kegg bin show_pathway map00471 rn R00243 09green red rn R00248 09green red rn R00256 09green black rn R01579 09 23bfffbf black E As a default nodes of up regulated reactions enzymes will be depicted in red nodes of down regulated reactions in green and t
28. ol without leading Dmel_ e g COX1 and CYTB 2 This holds for gt 99 of all genes For the remainder KEGG uses the gene symbol e g COX1 and CYTB 4 2 The PathWave main procedure Apart from the preprocessed pathway information e g the 2D grid representations of metabolic networks PathWave requires two inputs for the identification of significantly altered pathways i A gene expression data set composed of gene expression profiles that are associated with gene IDs For each gene ID only one profile must be present The type of gene ID required is the same as used for the preprocessed pathway information e g Entrez gene IDs for KEGG hsa see Table 4 1 ii A mapping of the samples in the expression data set to two subgroups classes that need to be analyzed for differential expression of pathway components e g normal and tumor untreated and treated The exact format of the required input data is described in detail in Section 4 2 1 see also the Usage Example Quick Guide document for a practical example that illustrates the required input format To run PathWave use the following R command pwres lt pathWave run preprocessed tag lt ID_tag gt input exprdata lt my_expr_data gt input sampleclasses lt my_sample_classes gt param kegg only_metabolism TRUE FALSE param ztransform FALSE TRUE param numperm lt num_permutations gt param pvalue correction method Bonferroni B
29. r Implementation in ABACUS Lecture Notes in Computer Science 2241 157 222 e J nger Thienel 2000 The ABACUS system for branch and cut and price algorithms in integer programming and combinatorial optimization Software Practice and Experience 30 1325 1352 e Kanehisa et al 2012 KEGG for integration and interpretation of large scale molecular data sets Nucleic Acids Res 40 D109 D114 e Piro et al 2014 Network topology based detection of differential gene regulation and regulatory switches in cell metabolism and signaling submitted e Schellenberger et al 2010 BiGG a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions BMC Bioinformatics 11 213 e Schramm et al 2010 PathWave discovering patterns of differentially regulated enzymes in metabolic pathways Bioinformatics 26 9 1225 1231
30. rs as in the following example preprocessed tag KEGG hsa input exprdata my_expr_data_file tsv input sampleclasses my_class_file tsv param kegg only_metabolism TRUE param ztransform TRUE param numperm 1000 param pvalue correction method Bonferroni param pvalue threshold 0 05 output file prefix my_output_file prefix The expressions used as key e g input exprdata are the argument names as defined for the respective PathWave functions see Sections 3 and 4 for their names and meaning The name of the configuration file can be passed to the respective PathWave functions Otherwise PathWave will try to load configuration parameters from the following default configuration files in the current working directory pathwave preprocess conf for preprocessing pathway data see Section 3 pathwave run conf for running PathWave see Section 4 Important notes e If an argument is both specified in the configuration file and passed to the respective PathWave function the value directly passed to the function takes precedence over the value specified in the configuration file e Mandatory arguments must be specified either by directly passing them to the function call or by specifying them in the configuration file 3 Preprocessing pathways Note if you just want to apply PathWave with already provided preprocessed pathway data see Section 4 1 and Table 4 1 you can skip this section
31. so be installed by clicking on Packages Install package s selecting a mirror and then the packages c the Bioconductor R packages multtest RCurl genefilter On both Unix Linux and Windows systems these packages can be installed from the R command line as in the following example source http bioconductor org biochite R biocLite multtest Additional external software that is needed only for preprocessing pathways from KEGG XML or BiGG SBML files will be mentioned where appropriate in Section 3 However preprocessing pathways will not be necessary for most applications since the R package already provides a number of preprocessed pathway sets for several organisms see Section 4 1 and Table 4 1 1 2 Installation of PathWave To install PathWave download the package from http www ichip de software pathwave html On a Unix Linux system execute the following command from a shell R CMD INSTALL PathWave_2 1 3 tar gz or from the R command line install packages PathWave_2 1 3 tar gz On a Windows system click on Packages Install package s from local zip files and select the file PathWave_2 1 3 zip 1 2 Loading PathWave To load the package within the R command line simply type library PathWave 2 Configuration file With the new user interface PathWave allows to specify all necessary and optional parameters in an optional configuration file composed of key value pai
32. the pathway data This step is not done using the R functions provided with the PathWave package but using additional C code GridArranger provided along with the R package 3 3 1 Required software GridArranger has the following two important dependencies 1 ABACUS v 2 4 alpha To run the Grid Arranger you will first need to install ABACUS v 2 4 alpha developed by the University of Cologne Germany Jiinger and Thienel 2000 Elf et al 2001 Important Since GridArranger is NOT compatible with later versions of ABACUS that can be obtained from the ABACUS webiste at http www informatik uni koeln de abacus we provide the required version v 2 4 alpha with kind permission of the original authors on the PathWave website at http www ichip de software pathwave html Download the ABACUS package from the PathWave website and follow the instructions in the file INSTALL in the main directory of the package Choose GCC 2 9 whenever ABACUS asks to specify a compiler 2 Linear program solver ABACUS requires an external linear program solver We recommend CPLEX which is free for academic research and available at http www ibm com software integration optimization cplex optimizer The following linear program solvers are also compatible Cbc Clp DyLP GLPK Gurobi MOSEK Soplex SYMPHONY Vol XPRESS MP 3 3 2 Installation of GridArranger GridArranger v1 0 is provided both within the PathWave 2 1 R packag
33. ull expression data set may contain further samples of other classes that will not be used in the procedure A factor instead does not contain sample IDs and must therefore exactly match the number and order of samples contained in the expression data set Hint PathWave will order sample class names alphanumerically and take the second as the control Example for classes normal and tumor the tumor class would be taken as a control and hence up regulation would mean that a reaction has a higher expression in normal than in tumor Therefore in this case it may be wiser to name the classes something like 1_tumor and 2_normal in order to make sure that up regulation refers to a higher expression in the tumor class In any case PathWave will report to which of the two classes the notions up regulation or down regulation refer In the above two examples this would be normal instead of tumor and 1_tumor instead of 2_normal respectively 4 2 2 Optional parameters to pathwave run The following parameters are optional in most cases because default values will be used if they are neither specified with the function call nor in a configuration file Be sure you understand what the default values mean before running PathWave e param kegg only_metabolism FALSE TRUE specifies whether only metabolic pathways from KEGG should be evaluated If TRUE default all other KEGG

PathWave v2.1 – User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents

PathWave v2.1 &ndash; User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents

PathWave v2.1 – User Manual