Home

GenPlex Introduction

1. Training Test 70 H 30 H Duplication Mode Array e Separate 4 a y a is can la lt Figure 2 42 gt Classification GEM Creating Window DD Output Path Set up the pathway of the created Analysis file Analysis Information Ref 2 4 1 Class File Construction O Select an attribute Sample Attribute item which is set up inputting data is shown and it is classified according to the user s selection Click Apply button then selected data only 46 300 will be shown on the table right side of the window and it is shown in different colors by Class to easy to distinguish Oo Selected Attributes Shows selected Attribute list Data Fraction The Training Data and the Test Data is composed in random according to the ratio that user has set up based on the number of selected data Duplication Mode Ref 2 4 1 It shows the data only corresponding to the selected Sample Attribute Click OK button to export selected data and Classification module will automatically activate 47 2 0 Reference 2 1 E Hubbell W Liu R Mei 2002 Robust estimators for expression analysis Bioinformatics 1842 1585 1592 2 2 Affymetrix 2001 Statistical algorithms reference guide Technical report Affymetrix 2 3 Y H Yang et al 2002 Normalization for cDNA microarray data a robust composite method addressing single and multiple slide systema
2. Cancel lt Figure 4 1 gt Analysis Creating Window Name Input the name of the Analysis File to be created Location Click button to select location where Analysis File will be created Description Input supplement information in to Analysis File can be skipped DONO Probe ID Type Select ID Type of the input data Commercial Product Probe ID Affymetrix GeneChip Probe ID Agilent Probe ID One dye Agilent Probe ID Two dye Applied Biosystems 1700 Probe ID CodeLink Probe ID 9 llumina Probe ID Operon Probe ID eo Public DataBase ID IMAGE Clone ID NCBI Clone ID NCBI GenBank Accession NCBI GeneID LocusLink NCBI UniGene ID e Others If ID is unknown 5 Species Select the species of the input data Click Create button then Analysis File will be created and data selecting window will appear go to 4 1 7 gt 4 1 2 Open Analysis On the menu bar select File Open Analysis or click on the second icon to open the saved Analysis File 4 1 3 Recent Analysis On the menu bar select File Recent Analysis to open updated analyzed Analysis File This list can be deleted using Clear History menu 4 1 4 Save Analysis On the menu bar select File Save Analysis or click on the third icon to save working Analysis File 4 1 5 Save Analysis As On the menu bar select File Save Analysis As or click on
3. Output Path Set up the pathway of Analysis file to be created Analysis Information Ref 2 4 1 ONC Sample Selection O Select an attribute Shows Sample Attribute item set up when inputting data and click Apply button then Sample Attribute selected by the user will only be shown on the table right side of the window Oo Selected Attributes Shows selected Attribute list Duplication Mode Ref 2 4 1 It shows the data only corresponding to the selected Sample Attribute O00S Click OK button to export selected data only and will automatically activate the Clustering module If Clustering module is already activated created GEM in working Analysis File will only be added 2 4 9 Classification It 1s possible to Export data to the Classification module used for diagnosis prognosis and prediction On the menu bar select Analysis Data Classification or click eighteenth icon to export input data to Clustering module Generate Gene Expression Matrix amp Analysis Data DEG Finding Clustering Classification Output Path F isample lDEGFinding Sample Name Analysis Infomation Name samplet MCD 1D 10 7 CD 1D 11 VCD 7D 08 CD 7D 10 CD 7D 14 EBY 1D 08 EBY 1D 10 EBV 1D 11 EBY 7D 08 EBY 7D 10 EBV 7D 14 Class File Construction Selected Attributes cdid cd d eby1d eby7d lt lt x fx fx KIKIR KIRI KIKS Data Fraction
4. 1 S3 n m 1 137 7 3 Volcano Plot This name is given because it looks like the eruption of volcano This is a useful visualization method to see the distribution in one view the genes extracted in Fold Change method and T Test method For example among more than 2 fold DEG statistically expressed genes small P Value are our concern To select these genes we will have to be concerned on genes in the corner of the upper side of the figure as seen below grey part of the figure below 16 00 12 00 8 00 Log P_value with base 10 4 00 0 00 4 00 2 00 0 00 2 00 4 00 Average log_2 Fold Change 138 7 4 Analysis of Variance ANOVA The experimental design for DEG finding it does not have to have always 2 groups to compare For example if there are 2 groups to compare there is no problem to apply Fold Change or T Test method but if there are more than 3 groups what shall we do There are 2 ways to solve this problem First apply the T Test to all possible pairs second apply ANOVA to all groups in one time For example if there are 7 groups to compare there will be 21 pairs to analyze when applying the first way and numerous DEG lists will out come from each pair But if it is to find out DEG which shows significantly different in statistical meaning among 7 groups this method is not the appropriate way Even if the statistical significance level is set up in p 0 05 for each 2
5. retinal S anti chr9q35 P15887 1368113_at 116592 Tff2 Rat Expressi Rattus norve Rn 34367 1 Rn 34367 2004 B trefoil factor chr20p12 ENSRNOGOO G09030 1368123_at 25718 Igfir Rat Expressi Rattus norve Rn 10957 1 Rn 10957 2004 B insulin like gr chr1q22 P24062 1368172_a_at 24914 Lox Rat Expressi Rattus norve Rn 11372 1 Rn 11372 2004 B lysyl oxidase chr18q11 P16636 1368226_at 171047 RGD620382 Rat Expressi Rattus norve Rn 6997 1 Rn 6997 2004 B Nucleoside 2 chr9q12 ENSRNOGOO 035820 1368243_at 29530 Amhr2 Rat Expressi Rattus norve Rn 10165 1 Rn 10165 2004 B anti Mullerian chr7q36 ENSRNOGOO G62893 1368269 at 25474 Lgals4 Rat Expressi Rattus norve Rn 9656 1 Rn 9656 2004 B lectin galact chr1q21 ENSRNOGOO P38552 1368284 _at 56765 Plyap Rat Expressi Rattus norve Rn 53915 1 Rn 53915 2004 B plasmalemm chr16p14 ENSRNOGOO QYY78 1368303_at 63540 Per2 Rat Expressi Rattus norve Rn 25935 1 Rn 25935 2004 B period homol chr9q36 ENSRNOGOD Q9Z301 lt Figure 3 10 gt Annotation Result Window Clustering Export to Clustering module only for selected genes 4 Classification Export to Classification module only for selected genes 5 Pathway Analysis Export to Pathway Analysis module only for selected genes Plot e Correlation Scatter Plot This visual shows the relation between selected Differentially Expressed Ge
6. 99 in to picture in form of table the Annotation o Pathway Analysis Export genes of corresponding Cluster to the Pathway Analysis module eo Annotation Show the Annotation information of the genes of corresponding Custer Profiling Graph Range It shows the graph of corresponding Cluster Profiling Genes of corresponding Cluster that the significant value is maximum is marked in green median in red and minimum in blue Heatmap Range It shows the Heatmap of gene among corresponding Cluster Data Range Data Range 100 4 4 Validation 4 4 1 GDI For GDI The Generalized Dunn s Index 4 5 click Validation GDI on the menu bar or click on the eleventh icon t to see the set up window Statistical Clustering Yalidation The Generalized Dunn s Index GDI M Clustering Type _ M InterCluster Measure 7 2 Gene Experiment Single Linkage All Clusters Selected Clusters Khieans_Detaut_ KiMeans_Detautt_ amp Add ll KMeans_ Default _5 ni KMeans_Detault_4 Kiteans_Detaut_s Seal r Distance Measure 7 Walidation Name a lt Figure 4 24 gt The Generalized Dunn s Index GDD Set Up Window I Clustering Type Select standard format of comparing clustering result Gene Standardize the gene Experiment Standardize the Sample 2 InterCluster Measure Select the method of calculating the linkage 4 Single Linkage Complete Linkage Average Lin
7. Case Sensitive to search classified by the Capital Letter and the Small Letter of alphabet O Image width Change the width of the Heat Map shown on the Result window Input the width to be changed and just click Enter 114 O Select Heat Map The color of Heat Map will be changed Red Green Blue Yellow Annotation Can verify the Annotation Information on the selected genes Ref So Clustering Export selected gene data only to the Clustering module Pathway Analysis Export selected gene data only to the Pathway Analysis module Result Graph Gene Selection Result will be shown in the graph X axis 1s the Rank Y axis 1s the Result Value of the calculation Drag the mouse point following the graph then the user can verify the rank and the result value of the calculation Visualization Can verify visually in three dimension of Training Data distinctive or not of the Gene Selection Result Oo 3 Gene based It shows in 3D using only high ranked 3 genes among Gene Selection Result PCA It shows PCA Result in 3D using all Gene Selection Result Each ball shows Training Data or Test Data of each sample and when user selects this ball the Sample Information will be shown on the table below PCA visualization Toggle Information Panel Toggle Perspective Mode Toggle Axis Toggle Grid Toggle Projection to XY Toggle Ball Type Toggle Ball Size Set Background Color Set Rule Color Save
8. SA Py ee TOi bbb E oaa Soo Wi ian Sa ae ZG VANA ATT TOENA Py ay Tiia 7 Bvba agara ar ens Ena ZA AT AUGE 1201004 ANA TAS 0 ovis TE E bga masaa inae Hoin ZV TARA AAA TARA a Ovi 470152 DHI iF EE E E TOR VGN TE E VITA 10 Lor 7000 40750178 ME 76 HiHi Tee OG 1S HQC SOR Se TODAS E HN 6 1 ARA Tee TR 1 06 MER AR TA 1007 12 Evia 41091476 ee BG A O AA A E AA lt Figure 2 31 gt Statistics Result Window D Shows basic statistics of the selected data in table format Items are as follows o Max Greatest value of the selected data Min Smallest value of the selected data Median Median value of the selected data Mean Average value of the selected data Stdev Standard deviation valued of the selected data 3Q 3rd Quartile of the selected data eo 1Q lst Quartile of the selected data 2 If a Flag value exists following item will be shown additionally e No of Flags Number of false Flag value from the selected data percentage In case Affymetrix Gene Chip Data following items are shown additionally e No of Present Call Number of P call of each sample percentage e No of Marginal Call Number of M call of each sample percentage e No of Absent Call Number of A call of each sample percentage 2 9 2 Box Plot On the menu bar select Statistics Plot Box plot or just click on the tenth icon EE 39 5 q fered Date DY Eb 15 08 O
9. U matrix Topographic Profiling Statistical Clustering Validation K Value Prediction for K means Clustering Dendrogram with various graphical options for publication 1 2 5 Classification It is the analysis method mostly used for diagnosis prognosis and estimation providing various statistical methods to find out the marker gene We can avoid the data over fitting through Generalization Error Estimation of classification and enables the analysis estimation more accurate and easier with the whole computation NW Feature selection finding marker genes for diagnosis W Classification classifying samples into pre defined classes E Error Estimation estimating generalized misclassification error rate E Whole Computation all in one approach for optimal classification E Sample PCA powerful visualization with various graphical options for publication 1 2 6 Pathway Analysis It is able to analyze the biological mutual relationship of genes from DEG Analysis Clustering Analysis Classification Analysis etc to biological genes It researches the genes related to common pathway using the biological pathway information of KEGG Kyoto Encyclopedia of Genes and Genomes database and in mapping the DNA Chip expression results it is understood in the pathway level the changes of the expression quantity according to the experiment condition m Pathway Search given gene lists all related KEGG pathways explored E Pathway Mapping
10. individuals using actual signal value calculated correlation coefficient of two individuals Absolute Pearson Using absolute value of Pearson correlation coefficient Initialization Method Select initializing method Pseudo Random The method generating similar random number in every repetition Totally Random The method generating optional random number in every repetition Number of Cluster Input the range of predicted Cluster Number K 103 Max Iteration 44 Input maximum repetition frequency Basic value 50 The name of the Prediction Result and click Prediction button to verify the result M Prediction Result E k value Prediction Prediction_Defautt_1 37 Result Information _ K FOM K K Value Prediction 1 1 39 449 2 2 35 378 3 3 35 994 40 4 4 33 495 5 5 32 163 aT 6 6 33 738 36 sa F 7 32 946 34 8 8 32 633 a E 9 31 821 pra AR Properties 10 10 31 803 30 28 4 Save as 26 Print 24 5 22 Zoom In b o 20 Zoom Out gt FOMCi 1 FOM i Slope sel 4 i 4 071 Auto Range b 2 i 2 0 616 18 3 i 3 2 499 14 4 i 4 1 332 del 5 i 5 1 576 6 i 6 0 792 10 Z i 7 0 313 e 8 i 8 0 812 S 9 i 9 0 017 4 2 o d 1 3 4 5 6 7 8 9 10 K FOM Adjusted figure of merit lt Figure 4 27 gt K Value Prediction Result Window D It is possible to verify the result o
11. lt Figure 3 2 gt Data Selecting Window Click Add button to select the file to be input then it will be added in the list on left side of the window and use Remove and Remove All button to delete item 2 GEM Format e Basic GEM ID Gene description Intensities Intensity Start column In case of Basic GEM user can set up the position where the Intensity Column will be started You can use the Description information in case Description Column exists between ID Column and Intensity OZ e Basic GCOS output Signal Detection Detection p value o Basic GCOS output Signal Detection 3 Data condition Select the Log transformed data which will be input eo The data was not log transfomed eo The data was log transformed with base 2 o The data was log transformed with base 10 The data was log transformed with base e 4 Click OK button to input data lumina BeadStudio Result It is able to use Illumina file from BeadStudio to input into DEG Finding module and create Detection column setting up the Threshold value of Detection P value within the file Import Gene Expression Matrix GEM Matric Illumina BeadStudio Result 1 Import Bead Studio Exported Expression Files Remove Remove All ESE EES x Detectioni Present Call Threshold lt Figure 3 3 gt Illumina Data Selecting Window Click Add button to select file to be input then the list will be added in the list on
12. n5 Weighted _ Save Close lt Figure 5 18 gt Error Estimation Result Summarizing Window 126 o Reference 5 1 S Dudoit et al 2002 Comparison of discrimination methods for the classification of tumors using gene expression data J Amer Stat Association 97 77 87 5 2 R Tibshirani et al 2002 Diagnosis of multiple cancer types by shrunken centroids of gene expression Proc Natl Acad Sci USA 99 6567 6572 5 3 C Ambroise and G J McLachlan 2002 Selection bias in gene extraction on the basis of microarray gene expression data Proc Natl Acad Sci USA 99 6562 6566 5 4 R L Somorjai B Dolenko R Baumgartner 2003 Class prediction and discovery using gene microarray and proteomics mass spectroscopy data curses caveats cautions Bioinformatics 19 12 1484 1491 127 Pathway Analysis 6 Pathway Analysis Pathway Analysis is the method to survey the Pathway Information of input data the list of genes with significant value Pathway Analysis Result provides easy biological interpretation applying various visual functions and editing function 6 1 File 6 1 1 New Analysis In case of the data is exported from other module the Analysis File will be created automatically so this procedure does not correspond Go to 6 2 gt gt On the menu bar select File gt New Analysis or click on the first icon amp then Analysis creating window appears New Analys
13. nr 1 100 0 30 10 hts lt Figure 5 14 gt Error Estimation Result Window K Fold 123 a Boii I pe cx El L en Esiimacion Rasuk information BS Tirar Did E O Clara Extivestion Algor ittve Bootstrap Mimi Khan NaturaLog mii gata E Member an Sample Cirit E iiet Eucidsan Distance Ordre Gene Couns 1300 Classification Aigorthe gt Weighted K Mesrest Helghibor fC Cieza 2 Hale 5 aC cae 3 v sighi petting Vesighied a OG Cines 4 Gen Selection Agam Pequisided 1464 Gana Selecion Number 5 EH Hinasislrvedlks H T st_3 M Argalricrd ibaa 3 E 88 Test Dala Cheesitestion i veighted HNI E bree Filira atu dl Loca dl ai CH Bootsirap_3 lt Figure 5 15 gt Error Estimation Result Window Bootstrap 5 0 4 Whole Computation On the menu bar select Error Estimation Whole Computation or click on the seventeenth icon E then the Whole Computation Set Up window appears Whole Computation Gene Selection Classifier Ignore parameter e Weighted K Nearest Neighbor O Two sample t test 2 class only O Prototype Matching with indeterminacy parameters 0 BSSIWSS O Multi FLDA O Kruskal Wallis H Test 3 Regularized t test Error Estimation e Incomplete LOOC Complete LOOCY Ranked Gene Number 2 100 genes per one gene _ 101 500 genes per 20 genes lt Figure 5 16 gt Whole Computation Set Up Window Select each Algo
14. 4045 She tami 7047 Aa 550050 535 134758619 16 381 503 1 ha 2108 235 5 510 485 2H 2H 50 50 1 055 63 333 0 TEL Ent 73 a713 Agra 1h 11623 23M AA 7438 ED Aa 205 655 OS 617 487 1 631 055 TAISE 70561 35 Sis H4 20 044 512 26 811 383 130 005 AAA AAA El lt Figure 2 24 gt Affymetrix Gene Chip Data Normalization Result Window D Normalization Result will be added in the browse window 2 Double click each data name and after applying Normalization it is able to confirm Signal of each spot and related items Right click on the After Normalization folder and click Save Data As Text menu to save each data in to text file in to GEM format text file 2 2 3 2 One Dye Chip Data Normalization Normalization 2 HEHhNormalizationi 78 El QS SS b DES Fla Raw Data STIS SMHQZ 25 PAULU FHA GOA SHS SHO HSS OSH Che ASst Sst Y 380 MHC HOF ELICE O Global Shift O Lowess Normalization e Quartile Normalization Click Save Gene Expression Matrix As Text menu to save data lt Figure 2 25 gt One Dye Chip Data Normalization Set up Window Global Shift 33 o Mean e Median 2 Lowess Normalization There is a trend of Lowess Line bending in MA plot where intensity range is low or high Lowess Normalization plays a role to straighten the bended part of the Lowess Line using Local Regression technique 2 3 Data Fraction Possible to set up the data ratio used for the calculatio
15. Algorithm 5 3 e LOOCV This is the method when K n in K fold method e K Fold Divide the data into K unit of fold Use K 1 unit as training set and another one as test set to sort out Error Estimation of K times and then calculate misclassification rate eo Bootstrap bootstrap Calculate misclassification rate through bootstrap sampling 5 5 2 Set Parameter s It is possible to set up the parameter of Error Estimation Algorithm that user have selected If itis not set up Basic Value will be used On the menu bar select Error Estimation Set Parameter s or just click on the fifteenth icon EE m LOOCV Basic Value Incomplete E K Fold Basic Value Incomplete Fold Number 10 Iteration Number 100 E Bootstrap Basic Value B 50 9 9 9 Run On the menu bar select Error Estimation Run or click on the sixteenth icon to verify the Error Estimation Result It is possible to verify Error Estimation Algorithm and Parameter set up information and detailed information of the result through result window 122 Fatimation Result infomation Fotmaten Ala dim LOCA rctmplele Caesarea i Duchasan Dusanc Orainary Cispsificabon Algorithm gt Weighted l Mesrest Helghbor Y Cees 1 han Nalua ogni EL daa Mumber of incorrectly clacaitied samples 0 Class 2 Khan Mahea og mi EAS daia Number of incorrect classified samples i E Class 3 pAh r _ Naturallog_rri_ME_datat Humber of incormectiy clas si e
16. E Pee 10 053 359 D Meg coxa ar es 8071 045 C iega F DD 19 11 63 71 DI Meg 285 roots ERRE 30 i Seng OME E pr a At 105 Y 2009 cx 10 zen 10H 148 EE CMe n EE 15 041 ara 1 2150 rao 16 Imaz Osta seems SIT 8048 30 i am TT m Seve Gere Expression Metric As Tet vant 15ER E Y E 74 113 1624 Cancel zane ERT 255 Ud i 2 1155 ara 1 Uma 2x6 za Pon 144 LI Gmg cx m A 0 708 wi 1 Om E a fae 16578 A DI mg Ex po Taide i 13 M Ming oe zi Zar 1442 Poe O Seng O30 ZARN 1515 HS Y Tm EA 2 oak UE de D Meg NS a CA 13 Be C Ainge a TARAH BrT 22 D me ex m zaa 0067 02 Py 0mo cue a Fas Le Fs 10 55 65 w an 0 174 0 H F E Dadi 30 FO Tos 40 Fra 10 664 aii SIDO SSSSSSES SSS SSF RD lt Figure 2 27 gt One Dye Chip Data Normalization Result Window D Normalization Result will be added in the browse window 2 Double click each data name and after applying Normalization it is possible to confirm Signal of each spot and related items Right click on the After Normalization folder and click Save Data As Text menu to save each data into text file Click Save Gene Expression Matrix As Text menu to save data into GEM format text file 2 2 3 3 Two Dye Chip Data Normalization Normalization Method amp Parameter Setting O Array wise Centering e Block wise Centering Print tip Lowess Normalization Data Fraction iteration
17. Empty D RT 4 8K 031216 33 lt Figure 2 21 gt Miscellaneous Spot Removal Input Window e Input ID list of the unnecessary for further analysis or select empty ID deleting option 30 click Apply button to confirm the number of Spot applying before and after on the table right side of the window e Click lt Back button to go to previous step click Finish and Cancel button to complete Filtering Error Spot Procedure or delete 5 Filtering Result E ee A Library El Ey Paterna bata 1 GAT 4 BH On ALIS 1 Lh Urtiga 1 ly 5 E rare a ILT Filtering Parameter aj Sare Fiered Daia As Taxi ground Eorraclbon i Popisi h a Kanga i Aopled GAT Care ry HI mic C F maii ERE Ty Baer SLR SSRI Seok eee eee TEET 5 i Miscellaneous Spot Remora Apples M GATA BH A eiia i CT ih OH A 1 Y ORTA BR CO PISA dupi E nent D GAT ASK OH HEIE maj mec MA Py GRT a OO AIDA 2 IO ANAS 8735 DIT 9 348 A tus DY ORTA BH ONDA x iO AARTEN 1 577 aora 8308 1502 tus E GATA ER OH HESIO DA dupin I A081 85 ua 2185 8 Ane 1 057 Ina T GRT A BK 141 5 I ARRISITA 10 300 11 70 11 005 1470 bra CJ Normalized Dala A I AAR SITS TRN RATT 2 307 1 21 bra T 1 AANE THT 855 ama IH ine l AAN TET HEH 14 541 14 421 0033 na E l AAH TEU 113 1 Ti for 1 006 In i 1 AE 11 444 TLA TAUT 15 na 1 AAJA OA q EA Mb Tin na i 1 AAJA PAA HELE rra PRI 1 ina a 1 ASS rI 6555 44 AO true 1 AAS 4 1
18. No Block wise Scaling Statistics MAD Median Absolute Deviation C Multi array Scaling lt Figure 2 28 gt Two Dye Chip Data Normalization Set up Window 30 DD Array wise Centering Method to revise classified median value e Global Method to correct Mean or Median value O Intensity dependent Global Lowess Normalization Method to correct using Lowess function 2 3 2 Block wise Centering Print tip Lowess Normalization Method to correct using Lowess function to classified block of the slide 2 3 e Block wise Scaling Method to correct with MAD value to classified block scale of the slide 3 Multi array Scaling Method to correct with MAD value the scale of the slide 2 3 4 Click Start button to accomplish Normalization E Normalization Result en E E Mormatined Data SO E PITA IAN RSA SASETA AN ea eA ee SA SEA besa mih 3 Filtered Date Y GRT 4 8 0391716 044 1 1 UY GAT 48H 000316 005 3 2 O GET aiH iniia OY GRIT ARK OSITO RA ibati DY GAT 4 8 006 2411 DI GATA A 4834 Normalization Parameter Block ween Coreana Liris Dita Fraction Gl Berean Mummies J Heth vero Scale HAL ede Adee Dental Blai ray Scaling ona Prits E F y ORT 4 AO dai 1 1 W Sn lake _ GAT 46H 09126 2821 2 1 AAA 9341 4351 1HE L010 tru LJ GATA 14 3 1 ABH FES ash Bala Aa Fh tru OY ORTA BA 11 350 4 1 AAS IBS BESA BHi REE
19. O ta H Valua s e en vaigi setting Visigited Ga Gissa Gene Selectos Algorirm DSSAVSS E Tesi Bala Gene Seton Number 50 0 Phan NaiuraLog mi Testets 20 SA manyasa C samomo Price Ra couse e aa P ee onomea IR MI CTA MZA TAE kh n_Natur lLog_m_BL_d ta MI Won Natron EV co MI TNA Ci Weighted Hn 2 Feror Feii eee HOI Ciaasitigation Clans 1 Hiran Pisura w D beai El S TEST 12 EWS Hd Cisasi icatorc Chas Y Han Mitur a Desa El S TEST 24_ RMS Ad Cisasi icatore Chas 4 Hran Matur O Estada TEST 189_BL Ehan_HN aturallog_rm_EL_data TEST 22_RMS Ehan_NaburalLog_mi_FM3_data TEET 18_HB hin_Nabarallog_mi_MBE_dala remna TECT id kD Mha Blada a ont KID der i lt Figure 5 12 gt Classification Result Window It shows Classification Algorithm and Parameter set up information It is easy to verify the distinguished result on each sample of Test data in table form It provides detailed information of distinguished result in tree format 121 5 5 Error Estimation It can measure the Error Estimation using Gene Selection set up and Classification set up that the user have selected Error Estimation is possible only if there are more than two Training Data 0 0 1 Select Algorithm On the menu bar select Error Estimation Select Algorithm or just click on the fourteenth icon Basic value LOOCV NW Error Estimation
20. Step 2 Data Selecting Window 16 Click Add button to select the file to be input then it will be added to the list on the left side of the window Click Remove and Remove All button to delete the item o Use ci e x buttons to range the files in order Oo Library Path Select the pathway where the files are saved cdf file for Chip type in case of 3 IVT and clf bgp pgf file of Chip type in case of ST array If there is no library file of corresponding Chip type click Library Download button to download the file Click Next button then the preprocessing method window will appear E Library Download Window As seen on the lt Figure 2 3 gt the file list to recognize corresponding Chip Type will be showed and there are 3 ways to download e Automatic Download from http affymetrix com e Automatic Download from http genplex co kr e Manual Download from http affymetrix com Select Annotation Server Y select From Where Library Should be Downloaded Hutsene 1_0 st 1 oclt Hutsene 1_ O st v1 pot Hutsene 1_O st v1 bgp Automatic Download From htto igenplex co com lt Figure 2 3 gt Library Download Window Step 3 E 3 IVT Expression Chip F Expression Array Processor Step 3 bend bono S Expression Array ISTECH arui i2is0 oo Quantification Methods 00 RMA Robust Multichip Analysis 0 PLIER Probe Logarithmic Error Intensity Estimate 6 MASS Microar
21. and treatment sample of each gene and then it is to see how much the treatment sample expressed relatively compared to the control sample Generally fold change is known as the ratio value itself but sometimes value transformed in Logs format is also called fold change for convenience we will understand fold change as transformed in Log value hereafter For Fold Change we have to set up the threshold of ratio value to obtain DEG sampling generally 2 fold is the standard it can be lowered to 1 5 fold or raised up to more than 4 fold according to the data But 1t can be a problem applying this kind of batch processing For example when 2 fold is applied there are relatively more genes satisfying corresponding condition in low expressed region But on the other side it is hard to satisfy 2 fold condition in high expressed region Also fold change does not consider statistical significance of variance among the group of gene expressed figure when comparing between groups For example if 3 control class samples and 3 experiment class samples are given we take an average calculating total 9 case of fold change of each gene to have DEG in fold change method But it is hard to say that this average value represents all 9 cases without mentioning how much we trust on the statistics Because of it can be distorted even if there are one or two outliers among these 9 cases For these reasons we can figure out that the fo
22. as below and designate Reference and Target Class Statistics Plot Statistics Box Plot Histogram MA Plot aa Plot Correlation Scatter Plot Correlation Matrix Plot MA Plot Reference Class Target Reference CD 1D 08 CD 1D 10 CD 1D 11 4 CD 7D 08 CD 7D 10 6 CD 7D 14 EBY 1D 08 3 EBY 1D 10 All Comb gt Add gt Target Class lt lt Remove CD 1D 08 CD 1D 10 CD 1D 11 4 CD 7D 08 CD 7D 10 5 CD 7D 14 EBY 1D 08 5 EBY 1D 10 lt lt Remove All Data Type Filtered Data v Normalized Data Mode Separate Pane by Data Type lt Figure 2 34 gt One Dye Chip Data MA Plot Set up Window Anaya E Omarm A E ee pipu EEA a E O ob duns DY ini D iba Li cb 7i DY ap rmi PY Ob 7014 L diy 10 00 UY ERY ipin LY Gf tii D fev To D ERY 7 10 Gir 70 1 e e a Dala D Ob tune 1 cb bo D aen O 0b Foun 1 oo ia PY ob m CI Etre 10 00 D fev 10 10 D ER iD 2 y Eb 70 00 Y BRM Tho Cy ey 70 14 leg ACD 10 10 CD 10 06 beau e Al dl ob eb oe ee Roa llr Oe lle 3 2 4 0 1 2 39 6 8 7 8 OH 2 w oe lng ACD 10 10 CD 1D 08 2 a Filtered Data Hormalized Osia lt Figure 2 35 gt MA plot Result Window D Can confirm MA Plot in each tab of the selected data and it is possible to see the Plot before and after Normalization in a same window or separated window according to set up form 41 2 9 0 QQ Plot On the men
23. biii Y obama O am C ob Fouad EE EGE D BBY irga Ci Eby 110 LI Eev 10 11 1 EE 70 08 Cy eer 7o 10 38 8 3 8 2 g 8 E g Y av ba E F itered Data a Nonmalcead Data lt Figure 2 32 gt Box Plot Result Window D Selected data will be shown in a Box Plot and in case of Two Dye Chip Data Box Plot of classified Block is supported additionally 2 3 3 Histogram On the menu bar select Statistics Plot Histogram or just click on the eleventh icon lai ei Legend racte 075 51 Y GRT 4 Be 0118 03 41 O GAT 48k ohea I GATA AAA DY ORT AEH B N dpat O GAT 4 8 009216 20700 1 1 I GATA I AA DY ORT AEH ON 218 250 301 dpi C GRT 4 0K 0116 2693 OI GAT aa 01216 28011 1 11 1 GAT 4 BH O30 716 20011 2 1 O GAT IH AA dupki DY ORTA BRA IA A E E homar Date I GAT aik AGA 1 DY GAT 4 BH neos O SATa 01216 06054 Di GRT 4 8 b or 1 GAT ABK 0M 216 220 1 4 OI GAT 4 0116 2492 GATA IT IR a C GAT 4 8 01216 28031 O GAT aut 0 A 11111 DY ORTA Be ORR ty O GAT 48k O12 Y GRT 40 ee lt Figure 2 33 gt Histogram Result Window D Can confirm the Histogram of the selected data in classified Array It is possible to show the Plot before and after Normalization in a same window or separated window according to set up 40 2 9 4 MA Plot On the menu bar select Statistics Plot MA plot or just click on the twelfth icon PP in case of One Dye Chip Data set up window will be shown
24. click Save Data As Text menu to save each data into text file Click Save Gene Expression Matrix As Text menu to save data into GEM format text file 2 2 4 Set Detection This concerns only to One Dye Chip Data and can set up change the Detection Threshold For the Threshold set up value double click Nod Analysis file name on the top in the browse window 36 or can verify selecting Analysis Properties on the menu bar 2 2 0 Log Transform This concerns only to One Dye Chip Data and can transform to Log value But it is not transformable only if it is already transformed or negative number in Signal 3 2 3 Statistics Plot Confirm the Statistics of the input data before and after Preprocessing through various Plots It provides Statistics Box Plot Histogram MA Plot QQ Plot Correlation Scatter Plot Correlation Matrix Plot All set up windows are as the figure shown below first select corresponding tab then select data and data format click OK button to confirm the result Statistics Plot Statistics Box Plot Histogram MA Plot GQ Plot Correlation Scatter Plot Correlation Matrix Plot Statistics Data Name 1 v GRT 4 8K 031216 03 5 1 1 2 v GRT 4 8K 031216 05 5 2 2 3 v GRT 4 8K 031216 06 5 3 1 4 GRT 4 8K 031216 07 5 3 1 duplication 5 W GRT 4 8K 031216 22 9 1 1 6 v GRT 4 8K 031216 24 9 2 1 7 v GRT 4 8K 031216 25 9 2 1 duplication 12 M GRT 4 8K 031216 33
25. miss tru O GAT 4 68 001216 4011 24 cups i AARTE 1000 11 183 11 055 0E trum l i GAT 4 88 031716111 1 AAR IS iE E amr Biting trun Ey Hormalred Data 1 AAS ESE ao TEST BHI 05304 tru Pr PA gt x 1 AAS EE 14501 14 140 14 331 mig true Dal a e b 1 AABN 11 Bat 1175 i Ter DITA tru dime Gere Expresion Hairi As Ted i AMET 19 37 19 538 13 407 DR irus 1 AA Tea RUEDA LEE E ier iria canal 1 Aa TE ASES ARH ASR 05 trus Tees 221 1 AUS 154 35 1a 3100 tru it ph dicta 14 1 AUC ss SUR SMS DA tru Er cy a ee HH L GRT ANI 16 1 AAS T aura su DUES true GAT AER 091 201 8 28019 4 41 iF 1 AAS nerd 4 058 Ps oH trus a ET AAA 18 1 A 114 11410 11453 01056 true J ORT AIDA eolica yg 1 AAS 10 59 10043 Tie ay 00114 trus 19 68740 0012163901134 2 1 AAS 4 40 a 14 Oat true El 1 AAS Taji BUDA 1 740 Quen rue Fr 1 AAA abe 9550 MU 00117 true 7 1 AAS aah 95010 PE Outta true En 1 AA DEA PT aoe 00043 iru E 1 AAS Ginn E MET true 3 1 AAGA 12630 14m 12670 Du true a 1 AAS To 10 500 10 40 10 50 Qua rus E 1 AAA A So Ges san 07 true 7 1 AAS 10401 E 10H 005 true E 1 Naa nan DATE iad utes tru E 1 AGOST 730 7 7 AH tru Er 1 BATI 9164 PALA 9173 TO fru lt Figure 2 29 gt Two Dye Chip Data Normalization Result Window Normalization Result will be added in the browse window 2 Click each data name and after applying Normalization user can confirm Intensity of each spot and related items Right click on the After Normalization folder and
26. of students problem when several different population variances are involved Biometrika 34 28 35 3 3 J G Thomas J M Olson S J Tapscott L P Zhao 2001 An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles Genome Res 11 1227 1236 3 4 X Cui amp G A Churchill 2003 Statistical tests for differential expression in cDNA microarray experiments Genome Biol 4 210 7 Clustering A Clustering This is the method of gene Clustering or sample Clustering according to similar significant pattern Gene Clustering is used for gene function search and Sample Clustering is mostly studied for diagnosis prognosis and prediction in medical field Clustering module provides various clustering method and visualization Also it is possible to operate statistical verification of Clustering result 4 1 File 4 1 1 New Analysis If data is exported from the Preprocessing module Analysis File is automatically created So this procedure can be skipped go to 4 2 gt On the menu bar select File gt New Analysis or click on the first icon amp then Analysis creating window appears New Analysis Analysis Name amp Location Name Untitled_1 Location 311 37 _Test versioniMy AnalysisiClustering Analysis Information Description ID Type Species Select ID Type Select Species
27. par S S S S S E pee lt Figure 3 8 gt Fold Change Result Window DD Shows the DEG Finding Algorithm and Parameter set up which user have selected Annotation 59 Annotation It shows all Annotation Information in one table Annotation Search Tree Top Assignment Search top assigned information setting up as default All Assignment Search all assigned information Can identify the information on Chip Type and Species as seen on the search window Click Search to search Annotation In case of Commercial Platform it includes all Annotation information provided from each Platform but in case of KEGG and Uniprot information is added Annotation Search beyond bsoinfarmatics ISTECH Annotation Search Tree 0 Top Assignment All Assignment gt Annotation Attymetrix Rat230_2 Rattus norvegicus rat230_2 ae probe id i ao GeneChip_Array E ao Species Scientific_Mame 7 Annotation_Date Sequence_Type Sequence_Source sise Transcript ID Array Design ao Target Description C Representative _Public_ID Archival UniGene Cluster 2 UniGene D i Genome _Wersion eof Alignments i ve Gene_Title lt gene_ ymbol ve Chromosomal_Location Unigene_Cluster_Type ao Ensembl lt Figure 3 9 gt Annotation Search Window 60 E Annotation Control vs Treat_FoldChange2_Filtered_2 beyond bioinformatics Platform Affymetrix Total No of selected columns 30 c
28. ries Deka dol El Filtering Parameter mw i he al mem PUTA Em TE 41 FE 13 ta 8 16 iT w Ls eddie EEE f lt Figure 2 17 gt One Dye Chip Data Filtering Result Window O Input data name will be shown on the browse window Double click each data name to confirm each signal spot and related items after applying Filtering Probe ID Probe own ID Signal Value of each probe Signal Intensity S N Value of each probe Signal Noise Flags Flag data of each probe True Valid Spot False Filtered Spot e Select Before Normalization folder then right click on the mouse and select Save Data As Text menu then user can save data in text file 2 2 2 2 Two Dye Chip Data DD Background Correction Compare the Background Intensity with Foreground Intensity of the Spot This is the procedure to exclude the Spot which Background Intensity is higher This step must be applied otherwise it cannot proceed to the following step 28 Preprocessing Wizard Step 1 of 4 Background Correction Spot intensity2 foreground background 12 H WS background intensity 401 22 AS spotS MALIC lt Figure 2 18 gt Background Correction Set up Window e Click Apply button to confirm the number of Spot before and after applying on the right side table of the window e Click Next gt button to proceed to the following step and use Finish a
29. the table The result which has higher score than other Clustering result is marked in red cell on the table 2 It shows the GDI result on the right side range in graph format 4 4 2 K value Prediction For K value Prediction 4 6 select Validation K value Prediction on the menu bar or click on the twelfth icon to see the set up window 102 K Yalue Prediction K V alue Prediction Select Gene Expression Matrix 60306 MISCB0222 ADN9 4 Fe 60306 MIS B0222 ADM1 4_F Objects Number of Cluster 8 Gene Experiment eae es i ciMax iteration Distance Measure Euclidean distance Max Initialization Method Prediction Y 013 BS Pseudo Fandor T lt Figure 4 26 gt K value Prediction Set Up Window Select Gene Expression Matrix Select the GEM to apply Clustering Objects Select the standard to apply Clustering O Gene Standardize with gene Experiment Standardize with Sample Distance Measure Select the method to calculate the distance Ref 8 1 Euclidean Distance Geometrical distance between two individuals Manhattan Distance The distance between two individuals considering the eravity occupying each variable Pearson Correlation centered Measures the similarity grade of two individuals using the correlation coefficient after transforming each individual s average O and diversity 1 Pearson Correlation uncentered Measure the similarity grade of two
30. value is marked in yellow 4 1 8 Analysis Properties On the menu bar select File Analysis Properties to adjust information of working Analysis File 4 1 9 Exit On the menu bar select File Exit to exit the program SZ Gira Count 17055 Simple Count YE E Semih C Case Sensitive O Evati match H D V G0 01 001 Y 60 01 007 Y 60 01 008 Y GC01 013 16001015 16001017 Y GC 01 003 Y 6C 01 007 A SATA 0457 na Sai par 16 arya miia 03554 2 BARRA 155 nate Ones 3 SAIS EES Dig ners 0637 14 1HE 2H 1581 a Harina 0180 E 11044 073 14 1357 5 ARTES 548 1434 00337 Hasta Tee oe Oran E amma Dam or 00043 107 T 0504 T ra 0 143 DAR 1 164 Ua 071 Dub oan Wee 0 546 1242 0 510 UHAT DA TRE 1 057 a RE ini 0 ue vu UU du ARE w AMT 03 1 75 A vase 0113 0010 EEE mo an 127 ot 109 LEE UF Te 155 12 AAG 0500 UE Wira 00 ra an GT ia Tr Er DEA rr win tue DT 147 Raza JTH oer 0 320 0 152 0374 050 Ls iO AIT Z 4 407 sano 6134 5075 4500 6357 Gat r 7 1 2607 3715 4 2 Preprocessing 4 2 1 Experimental Information This 1s the procedure to input the experiment information of the input data just select Preprocessing Experimental Information on the menu bar Experimental Information Sample Attributes ZYE TL SACO type time dose etc S PASE WSISLC 2 SABS SD 290 12 AS ws VARO HZA YAE Fil Down HES 0125101 YAS YA Y USLCKMHS BD se UAE Sa S MSI SMO WS OSI SS OSHS HAM YAM BIH S
31. 00 1S TALH J1 UA cur Ea raai a ab 4 ns ali EE ama 1036 10514 40 HHA 13259 13 689 13 078 12 084 11 637 11 756 lt Figure 6 2 gt Input Data Verifying Window O Double click the name of the input data to verify the input data on the searching window 2 Click the button on the upper side of the Data Verifying Window to save it in to text file or can verify Annotation Information of corresponding genes 6 1 8 Analysis Properties On the menu bar select File Properties Analysis to adjust the information on 131 working Analysis File 6 1 9 Exit On the menu bar select File Exit to close the program 132 6 2 Pathway List DD In the Pathway List folder right click on the mouse Pathway Search Pathway P Value menu and then it is possible to verify KEGG Pathway corresponding input data and P Value of the corresponding Pathway Search Other Tools Window view Help Fie Search Other Tools Window View Help 630402 amp a ae Baper fi Analysis A Analysis gt TEST_Yolcano Flot_test con mlp E TEST_volcano Plot_test con zi Gene List lof TEST_Yolcano Plot_test con gt TEST Volcano Plot_test con mip TEST_Yolcano Plot_test con t Gene List eu T Regulation of actin cytoskeleton 20 Y Focal adhesion 19 e 13 Cytokine cytokine receptor interaction 16 en T MAPE signaling pathway 15 coe 1 Cell adhesion molecules C h439 15 SE 17 Metabolism of xenobiotics
32. 00044 0 000044 0 000052 0 000044 0 262827 0 814453 0 794268 0 354453 0 963181 0 39692 0 987326 0 987326 0 033677 0 48511 0 987326 0 67264 0 185547 0 996355 0 058444 0 000244 0 000244 0 000244 0 000244 0 000244 0 000244 0 000244 0 000244 0 000244 0 999512 0 999512 0 953857 lt Figure 2 5 gt 3 IVT Expression Chip Preprocessing Result Window l Use Affymetrix Power Tools APT for RMA Plier and other calculations http www affymetrix com support developer powertools index affx 18 WN Gene ST array Gene 51 Array Processor Step 3 BEE beyond bontormatks Gene ST Array ISTECH HuGene 1_0 st vl Quantification Methods 0 RMA Robust Multichip Analysis 2 FLIER Probe Logarithmic Error Intensity Estimate Normalization Methods PM Intensity Adjustment 0 Global Median 4 Guantile 0 Sketch Guantile fe Ph only PM GCBG CHP Type L Save CHP files in AGCC format lt Figure 2 6 gt Step 3 Gene ST array Preprocessing Method Selecting Window e Quantification Methods RMA Robust Multichip Analysis Plier Probe Logarithmic Intensity Estimate e Normalization Methods Global Median Quantile Sketch Quantile e PM Intensity Adjustment PM only PM GCBG e CHP Type Select the item of Save CHP files in AGCC format chp file will be created in the folder where cel file is saved e Click Finish button to operate preprocessing go to 2 2 gt Use Affymetrix Po
33. 1 T Test it is possible to expect to be false positive for approximately 21 0 05 1 Test result Accordingly in case there are more than 3 groups to compare the statistical significance level that the user has set up is guaranteed and the useful way of analyzing at once is ANOVA method 139 8 Clustering Algorithm This is the method to clustering genes or sample following similar significant pattern the former one 1s called gene clustering and the other one is called sample clustering Gene clustering is used for gene function search and sample clustering is used for diagnosis prognosis and prediction of disease in clinical field 8 1 Hierarchical Clustering HC Hierarchical Clustering is a classical and a general Clustering Algorithm used in statistics This gene clustering method which used broadly after Eisen et al thesis that is a study of external stimulus of yeast molecule genetic reaction through DNA Chip Hierarchical Clustering can be divided in Divisive Approach and Agglomerative Approach but Agglomerative Approach is generally used Divisive Approach is called top down method because it approaches from the bigger group to detailed group and Agglomerative Approach is called bottom up method because it approaches cluster from nearest individuals to the bigger group Followings are gene clustering method using Hierarchical Clustering For example suppose there are 1 000 genes E First Step Algorithm activates co
34. 11 3 1 Data Type Before Normalization v After Normalization lt Figure 2 30 gt Statistics Plot Setup Window 2 3 1 Statistics It is possible to confirm basic statistic value of input data On the menu bar just select Statistics Plot Statistics 38 AY obig 154 655 6 727 1186583 DATAS TAAS F eiii 190 516 AME 1221884 371 1 2 0792445 aaa E 11 581 10 2 3 114 073 5 645 11501081 ADE a ee 3h a7 yx 100647 5 500 1164S PE 10 10550146 JD b i 5 komio 12503055 Mian W E 154 047 TAGE HESS MENAS SRE ll ol BO rtd 11700492 37 HAZ 208277 2ER 137 676 5 714 1148705 TU 24 10573147 _ EBV ippa 7 FBV ID 1050630 2X AI 138 263 BHE IOS EA Ci Eb D B Fevipan 612103719 3N aT DA e RAE Y Eb 10 11 a Er 111 11 073 e00 133444 EII EBASI IEU AW 1074145 D egw 70 08 10 EBV 77DA 12413505 157 705 STE ESASI MEFA POIs O eer 7o 10 11 EBV 7D 410 12510585 3 150 523 TATE MASA HAU A 10077145 DY BBY TDA 42 Eev 7014 11 05 514 143419 5 753 RAS ITE 296 AT D Sl homara Daa 1 at D ob meio O dtii Li ob 70 05 Di ob mei ormalze Sig a Sa Signal_33 Sigral_14 Mo of Pre Ab DY 0b Tuta 1 E an oe iO Sie E ZA PES ACE TA UY Ber 10 08 2 abo OA DEA 110565 aaia JAM Sai AMS GR 2 DRAMA M EEN ADA J biin TH 1 dae SUP es A ah VSG Bee TEA DY men acest Ce EE 2 OF ese GAR A 2006 TOES HEAR VEAS PY ep nn 6 fora ANU we eS E Ta ETA 21270 HSA MN
35. 112969621 Disie Write bo Me Halo Duck apy HC I MLM MIH lt Figure 3 15 gt Related Database URL Linked Window Average log2 Fold Change Shows Log2 Fold Change value of each gene Average Fold change Shows Fold Change value of each gene Regulation It is marked UP when Average Log2 fold change value is bigger than cut off and marked DOWN when smaller than cut off This means that it is Up regulated and Down regulated Heat Map It shows the Heat Map of extracted expressed genes and can 64 easily verify in one view the intensity information of each gene 3 3 2 2 Class Paired Test 3 3 2 2 Class Paired Test Algorithm amp Parameter Setting Paired T Test Wilcoxon Signed Rank Test i LA Setting Result Name Paired T Test_1 e Significance Level Number of Genes 15 594 E Reset O Class specific Number of Genes Class 1 Class 2 Statistical Significance Computation Asymptotic Distribution Multiple Test Correction Matching Pairs Reference Class Target Class Reference Class Target Class Paired Arrays In Given Order gt Set Pairs gt gt lt lt Remove dl lt lt Remove All ok Cance lt Figure 3 16 gt Paired Test Parameter Set Up Window 3 3 2 1 Paired T test In the Paired T Test 3 1 select DEG finding 2 Class Paired Test Paired T t
36. 1421148 a at 1416288 at 1435595 at _ Case sensitive Exact Mat T Score 70 38 52 55 50 41 94 35 44 64 42 67 43 27 46 95 99 39 40 21 36 6 39 21 31 76 14 29078 1436999 at 15 307 v 14 2008 1417613 at 36 18 lt Figure 3 18 gt Welch s T test Result Window 3 36010 Z test In the Z Test 3 3 just select DEG Finding 2 Class Unpaired Test Z Test from the menu bar Parameter is identical with Welch s T Test Ref 3 3 3 2 3 3 3 4 Mann Whitney Test In Mann Whitney Test 3 1 just select DEG finding 2 Class Unpaired Test Mann Whitney Test on the menu bar Parameter is identical with Welch s T Test REE 3 3 3 2 3 3 4 Multi Class Test This 1s used in comparing more than 3 classes statistically 68 3 3 4 1 One Way ANOVA In One Way ANOVA 3 1 just select DEG Finding Multi Class Test One Way ANOVA on the menu bar Parameter is similar to Welch s T test Ref 3 3 3 2 3 3 4 2 Kruskal Wallis H test In Kruskal Wallis H Test 3 1 just select DEG Finding Multi Class Test Kruskal Wallis H Test on the menu bar Parameter is similar to Welch s T Test RET 3 3 3 2 3 3 5 Combine Results It is possible to combine genes of DEG Finding Results in various methods Select DEG Finding Combine Results on the menu bar then the set up window appears Select DEG Finding Results to be combined from the list then click Combine but
37. 306 0 282 0 309 D0ja 2 67640 0 150 1 015 0 645 0 961 0 204 1 023 0 110 0 116 2129 07 3 215900 1 418 0 908 0 430 0 154 1 184 0 436 0 587 0 247 2118 0 16 4 50788 0 728 1 206 0 415 0 010 1 258 0 749 0 060 0 039 0 053 0 4g 5 69544 0 278 2 257 1 804 0 480 0 815 0 982 0 280 0 437 0 769 0 00 6 56727 0 532 2 472 1 680 0 773 2614 1 863 0 615 0 580 0 029 2 04 7 320193 0 625 1 740 0 616 0 102 0 151 0 013 0 066 0 021 0 759 0 02 8 68528 0 926 1 248 0 792 0 191 0 283 0 151 0 230 0 119 0 271 02 9 106639 1 662 2 013 4 131 0 694 0 366 0 133 0 639 0 963 1 108 05 10 12154 0 615 0 062 0 738 1 024 0 866 0 687 0 110 0 417 0 160 03 11 170756 0 575 0 766 1 187 0 315 0115 0 187 1 278 0 492 0 528 0 2 E 1111 INI IRIKI I NA 411 f F LOSA I I I lt Figure 4 23 gt Result Window of each Cluster D Menu Bar on the Top e Click Save button to save the information of corresponding Cluster Matrix save gene data ID and Signal intensity among corresponding Cluster in to text file Profile save Profiling graph of corresponding Cluster in to picture file Heatmap save Heat Map image of corresponding Cluster file e Click Full Profile button to see all profiling of gene in Profiling graph of corresponding Cluster it will be changed in Simple Profile to transform flexibly the graph of the Simple Profile and Full Profile e Click Annotation button to verify Information of genes among corresponding Cluster
38. 44 ua 5785 AN true vO Anoe M es 1 A 02 Sur Aa 41 06 true 17 1 EE la pci ata 1 400 true 18 1 ret a 10 5034 11 953 11 453 1 059 inus 18 1 ALDO 972 OE 10302 1 092 inus PI 1 ATEO 8 504 10304 2554 1 500 true Pal 1 ADO 6741 BTE T Tan 1507 inas a 1 AARTEN 3E mira 100143 Shs 1 g6 inas a 1 AAR 37s 10538 9953 1 056 rue aa I AATE N 155 imari EE ir 1 13 inss 2 I ATES BER THE E mai 1 EH inss w I AAO AT 1113 13314 13870 1054 hna y I AOE TA 10 351 11 559 10 950 1 178 Inu H I AAA 4 7335 TIMO Sorr 2445 hna Pr l AASA 30 ES ThA 1 040 na 3 l Erim 165 E LTS 120 ine EJ l Came rl od Ga 0 106345 na 1 d Ail a animai are thule eran i lt Figure 2 22 gt Two Dye Chip Data Filtering Result Window The name of input data will be shown on the browse window Double click each name of the data to confirm each Intensity spot after applying Filtering and related item Block Number of Block of the Spot ID ID of the Spot R Intensity Value of Red treatment Dye los transformed value G Intensity Value of Green control Dye Cogs transformed value A Average of R and G item M Ratio of R and G item Flags Flag Information true valid spot false filtering spot e Select Before Normalization folder then right click on the mouse and select Save Data As Text menu to save data in to text file 2 2 3 Normalization On the menu bar select Preprocessing Normalization or
39. 8 13 Missing Entries Number gt 1 ea gt h pos KR gt 2 2 q gt DODODODO DO OOOO Oodododool oO 0 0 O O 0 Ni Oo FD No lt Figure 5 4 gt Filter Missing Data Input Window DD Missing Entries Select the standard and click Select button to select proper genes to be deleted O Number Select genes to be deleted based on the Total Column of the Missing Value o Percentage Select genes to be deleted based on the Ratio of Total Number of sample and Total Column of Missing Value 111 2 Click Remove button to delete all selected missing entry from Training Data and Test Data 9 2 3 Impute Data On the menu bar select Preprocessing Impute Data for the user to complete the missing value according to regular rule I2 9 3 Gene Selection Select Marker Gene from input data using statistical method possible if only there are more than two Training Data 9 3 1 Select Algorithm Gene Selection is On the menu bar select Gene Selection Select Algorithm or click on the seventh icon to select wanted Algorithm NW Gene Selection Algorithm e Null Select entire gene Basic value eo Two sample t test Only possible to select when there are two Training data o BSS WSS cluster Use the error between clusters and ratio of error among cluster 5 1 O Kruskal Wal
40. A 65 DeO AS Na Ta ea Weds yaks a e a E T 66 DO Millas Lens 68 111 DO Hori A eC eee nie ete A 69 310 0 ad PORE eMC SA N 69 On Oe ls Export to Sena Mo puts e oo 69 31050 Export to Pathway Analysis Module aa dicos 69 dde Save Result ida 70 3 4 STATIS OS NOE ARO MA USO o Gia neato E E T fel 3 4 1 BATE a E r coda 71 a Sample Correlation MaI eaea a a a idas EL 3 4 3 O A O A A 12 3 4 4 Correlation cater FOL nesta aa a E ston a as Lo 3 4 5 Corelation Matrix loca 12 334 0 A A hatte teh a Dundee eit 74 3 4 7 VOKO Plo l ea A IN aA 74 iO A e ne o et ne eer enn re IA 17 AI O aa hada aN al 18 Be AA O A A A enadint T9 4 1 A coe cect sentinel vse aoe a oar 19 A Ns ded Pe NS ase at A A esas nasa ee naises T9 Ae Wee Mo Passa ee Geese a ag onesine i dasa dcaaas 30 Ae de RPS TE Ty 1S ares phason O 80 4 1 4 A A A ae wee 80 Ae Vl US A soa cients odin oad doa a a E ONA 80 4 1 6 Bis A 80 Ae TAN a A a a a E e 31 4 1 8 Analyse Propere A Deaetaoloae 82 BEY ES Daa aaa a a ngs nada a a a 82 4 2 NN A Bus ec O Roe ise es AES 83 4 2 1 Experimental lint OF MAIO seriene e E A A A ETNA 83 A Lis E A A A CO Ae WRT Vey Ten wet 83 AD Gene O oe 84 4 2 4 Mis ne Dato Pere rn es a do a 85 AO MiB 1 lO a EEEE A IN T 85 4 2 6 Conna o eo a e dois 8D 4 3 Ai A II O 88 AOL iaa AS AAA See el cae ans dea wiast ee atesacas teases 88 4 3 2 means CLUSTER la O aii 92 ASSO SO ia A re ste nein sagen R A 94 4 4 A A lan am aaa aoe ea
41. Click OK button to see the result window 3 3 1 2 Fold change Two Dye On the menu bar select DEG finding Fold change gt Fold Change Two Dye then set up window appears Algorithm amp Parameter Setting Fold Change One Dye Fold Change Two Dye D Result Name Fold Change Two Dye _7 Q log Fold Change absolute value Separate Up Down Results O Single Class Analysis e Common DEGs across all samples for each class O More than common across all samples for each class O Separate DEGs for each sample 4 C Two Class Comparison Cancel lt Figure 3 7 gt Fold Change Two Dye Set Up Window DD Result Name Input the name of the Result which will be created 2 Select the cut off to be applied e logs Fold Change Value of the Fold Change which is Logs transformed Basic value 1 O Separate Up Down Results Each up regulation genes and down regulation genes list will be created resulting from Fold Change 3 Single Class Analysis It is applied when significant genes are selected from each single class e Common DEGs across all samples The genes intensity that is above cut off of all samples from each class is selected 08 eo More than U common across all samples The genes ratio that user have set up from all sample is over the cut off is selected O Separate DEGs each sample The genes over each sample s cut
42. D D 0 0 00001 0 00017 0 00043 0 P D D D D D D D D D D D D D D D D D P D D D D D D D D D D D D D D D D lt Figure 2 7 gt Gene ST array Preprocessing Result Window 2 1 2 Import One dye Chip Data On the menu bar select File Import One Dye Chip Data or select the second icon ER then Analysis Input Window appears Import One Dye Data Analysis Name Untitlec_2 Directory D GenPlex_Retail_Testimy AnalysisiPreprocessing Description File Format Select Data Format Species Select Species Cancel lt Figure 2 8 gt One Dye Chip Data Analysis Information Input Window 20 Analysis Name Input the name of Analysis File to be created Directory Click button to select the location where Analysis File will be created Description Input information on Analysis File can be skipped GBovoe File Format Select file format of data input e ABI chip data e Illumina chip data BeadStudio Exported Gene Probe Profile Data Agilent chip data GenePix Results format gpr Species Select type of the input data Provides different Species according to Chip kind e Others If no species corresponding e All If species is unknown browse all species when using annotation function afterwards Click Next gt button then the Analysis will be created and the window as below will appear Click Add button to select file to be added the
43. De 11465 0 411 06 0157 lt Figure 3 4 gt Input Data Verifying Window eo To confirm input data double click on the name of the data in the browse window Missing Value will be marked in Yellow 3 1 8 Analysis Properties On the menu bar select File information under operation 3 1 9 Exit On the menu bar select File Exit to close the program Analysis Properties 94 to adjust Analysis 3 2 Preprocessing 3 2 1 Check amp Match Data On the menu bar select Preprocessing Check amp Match Data to confirm whether the number of gene of inputting data matches with the ID If it does not match Analysis cannot be processed 3 2 2 Filter Missing Data On the menu bar select Preprocessing Filter Missing Data then the window as you can see below appears and it is possible to delete Missing Entry from the input data Filter Missing Data ID Total Class_1 Class_2 13x SSC 1 2 44964379 3 44817697 4 44819165 6 44819153 7 44818337 12 44817808 13 44956016 14 A4A956037 15 44956259 16 44955948 18 44955952 21 44955885 25 44926265 28 44926181 30 41146203 31 44955094 32 44998305 33 A4955587 10 11 12 13 14 15 16 1z 18 19 Missing Entries Number gt 1 H ea QL meee omo KIKIKI K e E 8 a al el e e fx x ll o gt 9 gt W9 gt gt O 0O 0 WI Y NiN 0 ooo o wooo
44. Editing to edit or combine each sample of input data in various ways D Delete It can create GEM excluding selected sample a A Coli li E Dni hamp ee DHIR SP DA Ca AI URTEA GE E Cian D A AHLIE Taieri Caman perico Maita Cra y asa 10050 Dra ny ASAS Y PA Saket Goma Dopiennios Matin Cre ray dba Y HA DOES iria AA FIA Beier Co da be der Corel Sei cama in de tear ed Cantrrerd soon bn Son de lt Figure 4 6 gt Column Editing Delete Set Up Window 85 Average Create GEM adding column with average value of selected sample Double click on each item of the new column to change the name AN A Degaron Culum Esta Deme bamn Carn Pakan BARA Se 02 m EA ees Coen 8 PAE CA PES EL Coda A AE ME Col E LOL Select ess prensa Maia oem Pony AOWA FONT Pa ARA HABEN Seeker Gana Eopreaias Mares Grs Wey ATA AA Hee Col Mr AA A Te Selecione to ban a apa i A Ot EA Conan hara Y darn baa re tabu 2 RS pS OR ed 1 KT ane CIA Barh ja a Kone AA oe 5 5 2000 Iman miii EL MATTE E ro ca NL Tiii A ein E AA E A E coke E AATAL foe en EA EA AO AA oe E FEO lt Figure 4 7 gt Column Editing Average Set Up Window Operation selected sample Create GEM adding column with value added or extracted of reference sample of Double click on each item of the new column to change the name Coria y ra petarda LES IR A corn Selec Game Expression fairs Or
45. Fokd Change Cutoff a f BHO Volcano Pkt Shadert 350 Ansas FO 2 112 Significance of Statistical Text a Pye 005 E de Togge Fold Change Thresnoid Line y in Toggle P Value Cutotf Line Mapping Gene List Sibad 6 gene bet below to be mapped on the volcar pla View Gene Litt Left 400 200 000 View Gere Lint Faget Average log_2 Fold Ch List suena Roset y Seve mege As a a rana S Frote O Search image with 5 2 Select Mest Map P Value Cisss_cdCisss_ Oa Cla Average log Average Fat 1675 Nra 1287 041 0 038 AAAAAA AAAAAA a 239 mee A 545 0342 0 007 AMAAMA AAAAAA 3207 MA 2056 02 0 007 AAAAAA AAAAAA 4 has Ira 172 3254 0 044 AAAAAA AAAAPA 689 123 2451 lt Figure 3 26 gt Volcano Plot Result Window 2 Control Fold Change Threshold and P value to have them reflected on the right side Plot It is possible to verify the distribution in Plot adding already known gene list or wanted gene list fo Double click on the wanted range on the right side Plot to verify the information of corresponding gene of the selected range Click button to add the corresponding gene list as DEG Finding result on the gene list on the left side of the window 176 3 0 Reference 3 1 J H Zar Biostatistical Analysis 4th Edition Prentice Hall Inc 3 2 B L Welch 1947 The generalization
46. GenPlex Introduction lt Version 3 0 gt Istech has all rights of this manual and this product You cannot reprint copy or distribute this manual and this product without permission of Istech Corp in advance We will consider all who have installed and who are using our product will agree on this policy Istech Inc Copyright 2008 ISTECH Inc Table of Contents Corrida 2008 STECH AA i MEIGS oreraa E a eek a A A O Wes Gone ATOU IO o a aies 9 l 1 SENATIN i h EIEE EATE E E E EE E EE EEE A 9 1 2 ME eO a o e A O OE mtn 10 EAL Pas Dia O Al re a aa a a eE 10 een TSO Ce SS caesarean A ee 10 EZ o DEG Differentially Expressed Gene Finding c ccccccecceceeceececeeceess 10 1 24 EE a a EET EEN EET ELA EIA EAEE od clot Rbk EE AT ES E AET 11 ator Alas Had ico da 11 1 2 6 PataWwar FAY SIS a T TEE AE E EES A AET E 11 ENA Biolosiesl Annotation amp Data MINNE iaa AES 12 13 Recommended Computer Requirements cccccecececcecccsccecectccuceceaeeceeees ES IS RON 14 NS SAS A A A a came aoe 15 2 Les FTG PEA Ao o TO ES Als MA Import Affymetrix Gene Chip DAA a El 15 A A Import Oie dve LD add 20 Dil Import i wo dre CME A eiaade oak oie 21 ae Wer SP I SS oid atcha Maer curl baat atom atind cand btw eu dae REA 24 2 1 5 EIA IN A aa NI A A 24 ZO S aE DE O O bos a Se ees 24 PETA SVS SS FS E EE E E EEE EENT 24 Al CGS AnS Sr tas 24 2 1 9 Analys F FO DE LES A II 24 A CONI e 24 ETE E as POE E
47. Gene Expresssion Matrix Filtering Select Gene Expression Matrix TEST F GEhi_1 Filtering _ Standard Deviation t SD 4 Humber ot Genes genet with greatest so D Threshold E 3D _ Coefficient of Variation Cv y Number of Genes genes wih greatest Ci C Threshold acy E C Max value Min value t Mihi J Mumber of Genes genes wih greatest Miil MM Threshold MM Combining Operation 8 AND C OR 270 passed out of 270 lt Figure 4 5 gt Gene Filtering Set Up Window T Select Gene Expression Matrix Select GEM which Filtering will be applied 2 Filtering Option O Standard Deviation SD Select genes same as the figure that the user has set up in bigger SD order e Coefficient of Variation CV Select genes same as the figure that the user has set up in bigger CV order eo Max value Min Value MM Select genes same as the figure that the user have set up in bigger difference between Maximum Value and Minimum Value Click Filtering button selected gene figure are shown below Click OK button to add filtered GEM 84 4 2 4 Missing Data Filtering On the menu bar select Preprocessing Missing Data Filtering to delete missing entry optionally from input data Ref 3 2 2 4 2 5 Imputation On the menu bar select Preprocessing Imputation to fill up missing value according to regular rule 4 2 6 Column Editing On the menu bar select Preprocessing Column
48. M7 Ff FPO CO KH OOO W A Oa Aa A a W 0 NN 0 lt Figure 3 5 gt Missing Data Delete Window TD Shows genes information with Missing value in table format o Line No Order of genes from the input data eo JD ID of each gene e Total Total of missing value of each Class O Name of each Class Number of missing value of each Class 2 Missing Entries Select the standard and click Select button to properly select genes to be deleted O Number Select genes to be deleted based on the Total Number column of missing value o Total Percentage Select genes to be deleted based on the ratio of Total 00 Number of Samples and Total Number column of missing value eo Class specific Percentage Select genes to be deleted based on the ratio of the missing value in each class SY Click Remove button to delete Missing Entry 3 2 3 Impute Data On the menu bar select Preprocessing Impute Data then it is able to complete Missing Value according to the regular rule 3 2 4 Log Transform On the menu bar select Preprocessing Log Transform to transform input data into Log value 56 3 3 DEG Finding 3 3 1 Fold Change 3 3 1 1 Fold Change One Dye On the menu bar select DEG Finding Fold Change Fold Change One Dye then set up window appears Algorithm amp Parameter Setting Fold Change One Dye Fold Change Two Dye Dresut Name F
49. Normalization Unknown Method MAS5 Microarray Suite 5 The data was not log transformed Global Scale Hormalization PM Intensity Adjustment Unknown Analysis log y Probe_ID 1 JAFFX BioB 5_at 2 JAFFX BioB M_at 3 JAFFX BioB 3_at 4 JAFFX BioC 5_at 5 AFFX BioC 3_at 6 AFFX BioDn 5_at 7 AFFX BioDn 3_at 8 AFFX CreX 5_at 3 AFFX CreX 3_at 10 AFFX DapX 5_at 11 AFFX DapX M_at 12 AFFX DapX 3_at 13 AFFX LysX 5_at 14 AFFX LysX M_at 15 AFFX LysX 3_at 16 AFFX Phex 5_at 17 AFFX Phex M_at 18 AFFX Phex 3_at 19 AFFX Thrx 5_at 20 AFFX Thrx M_at 21 AFFX Thrx 3_at 22 AFFX TrpnX 5_at 23 AFFX Trpnx M_at 24 AFFX TrpnXx 3_at 25 AFFX r2 Ec bioB 5_at 26 AFFX r2 Ec bioB M_at 27 AFFX r2 Ec bioB 3_at 28 AFFX r2 Ec bioC 5_at 29 AFFX r2 Ec bioC 3_at 30 AFFX r2 Ec bioD 5_at 34 AFFX r2 Ec bioD 3_at 32 AFFX r2 P1 cre 5_at 33 AFFX r2 P1 cre 3_at 34 AFFX r2 Bs dap 5_at Ba AFFX r2 Bs dap M_at 36 AFFX r2 Bs cap 3_at Signal 9 185 883 14 708 882 9 175 669 8 081 029 9 044 176 68 394 82 121 934 898 273 995 719 327 906 312 51 872 8 479 11 188 62 112 10 423 41 291 5 455 3 463 193 969 15 721 2 776 21 987 109 953 3 001 89 337 15 855 854 13 897 094 11 665 726 11 353 922 13 152 868 131 041 641 164 332 281 308 942 969 384 500 812 63 387 59 474 2 509 Detection D brBr 0 ile Be Be ie DoDD lB Be DDD Detection P value 0 000052 0 000044 0 000044 0 000052 0 000044 0 0
50. RENA a e E 1 ee 1 aur 7 Mentor mim 0 857 1 104 134 1 713 0503 133 0 170 0 TSS 0574 0 051 aay 0170 0 605 f Or 4 0614 a RO nim 015 ama nag 0 57 0441 OSS 0 074 to ami cst IO DOE 050 WRAT 6145 sea 0754 Mo aazsra 1802 1 381 1 115 237 1415 358 1288 12 44700003 0054 DES ARS 273 0 263 AETO mam 0 876 da Ther DD 000 fatal 0 205 0 154 TO 11 OARg ta Rita 03 145 oa DE 17 Sub EET 4 Sed 15 aasian ht 3 06 Sars ELE 4am Sn hash 3 360 w Has 2051 aaa 4519 3508 4 AAi Oo 1 655 2254 0359 0 507 az 050 D415 w meen 0 032 any as DERT 036 Ad 0006 4 nes E a mi 21674 Pa 400 ap ics Path Jan 20 a8 0006 1487 oi 1 a3 Ful 473 B Bx 845 an ria TADA dd Oe a 71 1 708 0 115 4 m 13m 1457 oul di fat Fri oes 113 1 ce ar 128 0 154 2 buena 155 15 nuit 1 640 uses w 0 500 04109 1 080 DT 0262 ES I Tas 17H 2600 1 An 144 000 EF 2113 2400 ae SA 150 tuna re 1 76 23 1554 1471 1 154 mi iar DES 1 047 DAT iH 2418 H ama 0 336 0507 ATA LHI 00 Er OSH e 1 07 Oke Hid oat a oan E 1 14 CE T 1308 041 En ue 0158 0573 1 005 En DEA 35 DAF IZE gay LED 15 Je lt Figure 5 3 gt Input Data Verifying Window O Double click the input data name on the browse window to verify input data Missing Values are marked in yellow 5 1 8 Load Test Data File s On the menu bar select File Load Test Data File s or click on the sixth icon then data selecting window will appear Ref 5 1 7 5 1 9 Transpose Data On the
51. S iusannd cen nGh E E E O on arenas 24 2 2 PreDrOCOS IN aaa lod iii toes 25 Delisle Experimental normal ars tisdale sii 20 Die os Pere Oi A A ia aces 26 LO Normal Dios ul Zi Sb Se Ol O A Ln O EE ke ndeeMetests 36 Lee TS Wiis AST A nee ae hace acl alodeneae aa encanta ae or Do SS o i ee atic tg eg ak aye ee aaa et 38 anes Nee E 38 A A BO Ted iia 39 DO E O TRA 40 A Aloe Al DIO A lle E EN o O Canteen rr tins E tare ener ener ET A2 2 0 0 CORPelaOM caner Plo estic delicada 42 LR CO HOn VI ADI Oda e anio a 43 2 4 PTA SUS A E E E A TA 44 Peiok A A A eases 44 AA GSE aa Re I E E EN EE TE E E E EA E 45 2 4 0 TAS SICA ON ican ae Sind aay A T E 46 ZO SS AGL cue e eee A nt E Tere en T A8 DEGC 10S ites Peet hata A A II A 49 IA A A bese saawatieneceuteuteelbaeens 90 3 1 P lap baste as E E O E oe a waters DO Sealy Pe NS inet O 50 alien Voie a AT ISS ics tesa A ces eeaees 51 IES RPS CANINI A den ITEE tas Go Oaoteueedaaetaiiamsuhie dent otbes ol SkA S A A ae we en 51 elo Save mal cal 51 OO Ai AO A a ites aula dada sen oh eauaseiemeesiebeandes 52 Sule ls Icio CV Ni asd A a a a a a a 52 3 1 8 Analys Propere A e a a a e ae a D4 ER Erias A A meu als 94 One POD OC CS A Re aise es AA 5D eels Ti A MaC A II E Mun anaehueaancuad 90 SATA Piker Nio a AM alent adios N E aa la 59 DOS E O bec eeeae D6 ack E ANTS AAA EE E A AATE EES EET A ESTE D6 Ono BECEN oeie a a Pareles soatdais ender a Ol ls la o AAA A EAE AIN 57 SA 2E aS AAA E
52. SLICh SYS Add Remove HES 0125101 ANAS QS LICH Select Gene Expression Matrix normal gem v Sample Name Attribute Type Time Dose Var Type categorical categorical categorical Y GC 01 001 normal Y GC 01 002 normal Y GC 01 005 normal Y GC 01 013 normal GC 01 015 normal GC 01 017 normal GC 01 003 tumor Y GC 01 007 tumor Y GC 01 009 tumor Y GC 01 011 tumor ALIS SD SD ANDAS OO 4M nm fi wih i h o g cms lt Figure 4 4 gt Experimental Information Input Window DD Select Gene Expression Matrix Select GEM to input or change the attribute 2 Attr Name Input the name of the attribute Type Time Dose are input as basic attribute It is possible to adjust these attributes and use Add Remove button to add or delete attributes SY Var Type This is the type of attribute can select Categorical or Continuous Now supporting Categorical type only Double click the cell to input the attribute use Fill Down Copy Paste button for easier input Click OK button the input attribute will be applied 4 2 2 Log Transform On the menu bar select Preprocessing Log Transform to transform input data into Log value 83 4 2 3 Gene Filtering On the menu bar select Preprocessing Gene Filtering or click on the seventh icon to select high ranking genes using statistical method of input data Gene Filtering
53. UEF 0410 043 Der 1 056 0554 0 504 0455 Osta 4th 1 006 6 700 1 105 0 574 uar 0 18 ell 0 47 0037 DAS 03 1 55 2H 05 4040 iry of 54d E gt De eae AET 0173 2087 1 134 Dari Uw 2550 12 00301 OL i af Fer a 1 581 1 ae 3750 DE 0514 0535 IRAI O44 3 114 Obit 6 10 11 5554 01650 1 101 1d EEES lt Figure 3 19 gt Basic Statistics Result Window ID ID of gene Maximum Greatest value of classified gene Minimum Smallest value of classified gene Median Median value of classified gene Mean Average Mean of classified gene Standard Deviation Standard Deviation of classified gene Coefficient of Variation CV value of classified gene 3 4 2 Sample Correlation Matrix This shows the Correlation of the Sample Sample Correlation then user can verify the result separately in the tab ral On the menu bar select Statistics Plot Each Class Result is shown Baa PENT HQ T mm Peta Class Tr cues T YA diia T Cer Escorts Madr Pl loo Pitt A Samples Sate Fr ed vekis T Test_1 Summary of Correlation Values Mariri 01559 IAEA lt Figure 3 20 gt Sample Correlation Matrix Result Window 3 4 3 Box Plot On the menu bar select Statistics Plot Box Plot or click on the eighth icon E to see the Box Plot for each Class Bo d Sen kanga eiss eel Ge Legend s Acs Tick Labets PO A on
54. a 17 A Za gt TR Mi Me a dLr s min disr x Xp a se d r s max laiskx Es eats ane A A Lr A ri ag y cluster af a diaii m g 7 bad E he ey ee a A a Ps Average Linkage E d r s S Y distlx x Le E na ja chester r q PE as sa a A cluster e ans j E O E 141 As Distance Measure there are Euclidean Minkowski Mahalanobis Distance and they can be shown as following formula The distance to compare for gene i and j is shown as dij Let s say X for gene information which the number of gene is p number of sample is n and define the distance between two optional genes as x 045 sea at X Pta Abel AU d olga R s E Euclidean Distance di XFX PY AX 7 Y Euclidean Distance shows actual distance used most generally E Minkowski Distance 1 m R x oe dj 2e Xp k 1 This Distance is the distance considering dimension information belonging to the individual E Mahalanobis Distance R _ R RET Gel R R This Distance is the statistical distance between two genes This becomes Euclidean Distance when Identical Matrix is S NW Correlation Coefficient S x X Xz x a o E A Dain r22 x y 142 The strong point of Hierarchical Clustering is to show in visualization and when it is clustering it is no need to input directly the parameter value That is to say there is no need to input the estimated number of
55. a Fay ARCATA coming Ania HASTE Atos Comfia TETI J Fon a ATA EE CEL Ee ep Omini e Dmm SA LILE A O Sean pens Expression biaia Ge y ra F ET i GS A EL Deeb ey A A A oka Tai ii Hee Cora Hare pea on ied Sipe a fete a oo Au peed EA 323 oR a Cpt A de A ee TEL 2 ES 2 CEL a 7 a 0 Oe E WE Le sa O t E eo A Lada ner CE pera 10n 7 Cet CEL 7 T jobi Macri CT Ci ni msj tel Pa Te Oma E coun iw oe DOAN AA coe A Ae A al l ser Mes cor S HAT APA ce BP MAL imeem Borie lt Figure 4 8 gt Column Editing Operation Set Up Window 86 4 Sequence Create GEM with column order relocated uiris pp 7 7 a Operands AAA Cos A GA AA Coba A Ga TA Sekci Gene Espora Miri Cra P ARCA T_T DO Ora Pr ee Meieri Gena Expresate Moran rs tomy Ade E AE Seka i bhaam kj le i ie Mes Coria jii Select col Lo be ROTA Few cnm ud ks aan On EF CL x per PODA PTA CEL A EA CEL Pol 0000 Eg fee E5 200 CEL HIH ahi 701 EA ETA A EL el Ez 0005 EL FS a CEL poca TC 53005 Mn 241 Ck AA e a Sequence tin lt Figure 4 9 gt Column Editing Sequence Set Up Window 5 Rename Change the name of selected sample 87 4 3 Clustering 4 3 1 Hierarchical Clustering For Hierarchical Clustering 4 1 select Clustering Hierarchical Clustering lon the menu bar or click on the eighth icon tl to see set up window Clustering Hierarchical Clusterin
56. alysis On the menu bar select File Save Analysis or click on the third icon to save working Analysis File 6 1 5 Save Analysis As On the menu bar select File Save Analysis As or click on the fourth icon lie to save working Analysis File in different name 130 6 1 6 Close Analysis On the menu bar select File Close Analysis to close working Analysis File 6 1 7 Import Data On the menu bar select File Import Data or click on the fifth icon then the data selecting window will appear W Data Input Result GenPle v2 4 Pathway Analysis Module Era Ay Analysis Pahwa TES T Volcano Pla tesieecon mlp fie Search Gther Tod Wandow view EP m Far Fer fa A Eer FEISE E E piranti AAA Lane ees ee aes Tt A Anaya El E Gere Expression Malria TEST_Vokcano Plol_jeet cone TEST Volcano Piol fetlcun 220 0 ees a Pa fhe TEST_Wolkcaro Phot escianu EHE TEST _Woltans Piot_test con EH Gene est w semin caseras 7 Exact Mach O TEST_Wolcano Pot test con Probe D SHUD Signal SALDA Signal SADI iq SMUD Sigal SMUT Signed EME ay 2 Postres List 1 1007 5 et 11 206 11302 1HE 12576 12158 12875 z ZG al 1 68 11 208 TES 10 UR TEJ 8 833 E 200658 gt al 10644 10 985 10 073 2500 9484 8 883 4 200599_s_al 12 975 13 404 12 599 11 487 12317 11 683 5 JING 3 a 1TG 11770 LERET 11 155 1259 11 973 E AT a 11 555 11 A 12003 13010 12451 13175 T 2007585 a
57. ances sana ieee asia aces reese ane eee ene eaeeeee 126 5 6 1 SHOW oamp leE A A 126 DO SHOW UNI Ve W aeea eh neue ta ite Agha uke bones teat aht bahia ks 126 NE AS E E E E O E E A A E E T S A A A Lr EE Anal VSG EE EE E E SEE ANE ETE E E E 128 OS Taar Aay Sn aa 129 6 1 A A II IA A SOE E sana es 129 6 1 1 NEW ANINO aa A O 129 6 1 2 OPERADA SS andas 130 ISA RECOM ANIA IY SIG iii liinda 130 6 1 4 E A D I E T TEE A ER E E EEA E E OEN T EET 130 6 1 5 A Ea E E E AE TE N A EE E 130 6 1 6 CO CAI A a ors a aaa aa en nen Sines 131 6 1 7 MAI Te DI A E lol 6 1 8 Apaysi Propere S aena A TA 31 CR a is 132 6 2 A A Wnt Shs staan Melee oa aa tye ae 133 6 2 1 Pathway Mage A e A 133 A O Teas eect rae eterna O easiness 134 AO GEE O08 e O OEP OME RR ee rae OTS 135 E OB Richie At tarada 136 Gols FONG GIA Oe A N 136 MZA Two sample Mp AS Cl Al esordeiicis patadas 137 To O a E ec e E di 130 TA Analysis ol Variance ANOVA scion a a ar e 139 oy listen Al or P eea a aa i 140 8 1 Hierarchical Cluistermne O a a E tien aid EAS 140 OZ IST All serie aan a A O eek ngodns haa A 144 8 3 Self Organizine Map SOM a A EENE 145 O MA e ton A SO III ed 146 Vl 9 1 NS O SA ST eR T 147 9 2 MS A II A 148 9 3 Generalizaon E OE ad leed ean a poe 149 vil Introduction 1 GenPlex Introduction 1 1 Summary Microarray DNA Chip that is able to monitor the intensity of thousands and millions of gene information at the same t
58. ata and Two dye Chip Data 2 2 2 1 One Dye Chip Data Flagged Data Removal This is the procedure to exclude the spot which has bigger value than the user have set up the Flag item value This method must be applied otherwise cannot proceed to the following step Click Apply button then you can see the number of spots before and after applying on 26 the right side of the table and click Next gt button to go to next step Preprocessing Wizard Step 1 of 2 Flagged Data Removal Signal flags SSS H 25H WS spotS AABLICH Fiter the spots with signal flags lt Figure 2 15 gt Flagged Data Removal Set up Window 2 Miscellaneous Spot Removal Input unnecessary spot ID list for further analysis or select option to be deleted then select Apply button the user can confirm the number of spot before and after applying on the right side list of the window Click lt Back button to go to prior step and click Finish and Cancel button to complete Filtering Error Spot procedure or delete Preprocessing Wizard Step 2 of 2 Miscellaneous Spot Removal DI SA Litempty ID 4 2401 28 27 spot DS YSAStD Apply HES FEU AS sptse AIE LICH ID Manufacturing_Test_Control_48 Manufacturing_Test_Control_49 Manufacturing_Test_Control_5 Manufacturing_Test_Control_50 C Case Sensitive Remove Empty ID lt Figure 2 16 gt Miscellaneous Spot Removal Set up Window 27 3 Filtering Result BB
59. by cytochrome P450 12 Y EChM receptor interaction 12 oe 13 Meuroactive ligand receptor interaction 11 d coon T nt signaling pathway 10 Y Cell Communication 10 d a O Glycerolipid metabolism 9 ai 13 Axon guidance 9 d oe E Tryptophan metabolism 8 co T Oxidative phosphorylation 8 SE TGF beta signaling pathway 7 oe O Calcium anaing Bee F Pathway Search Pathway P alue Sort by Gene counts Save Lists Exit pn A Arachidonic acid metabolism 2 ee D Arginine and proline metabolism 3 T Ascorbate and aldarate metabolism 1 DY Axon guidance 9 ve ye Bazal ena factors 1 lt Figure 6 3 gt KEGG Pathway Search Result 2 If you select Sort by Gene counts menu it will show the Pathway list in bigger number order of the number of related genes in an array If you select Save List menu it will save the Pathway list in text file 6 2 1 Pathway Umage D Double click the name of Pathway on the left side tree to see the KEGG Pathway image 2 Genes related with corresponding Pathway is marked in red box 3 Right click on the mouse to see the popup menu e Click Hide Gene to hide the marked related genes e Click Show Heat Map to see each Heat Map expressed information to the related genes 133 e Click Heatmap color up down to fix the color of the gene expressed value e Click Save Image to save as picture file 4 Below it p
60. c EY Training Data Search Image width 5 2 Select Heat Map Y GA Class 1 Annotation i Pathy Result Graph Visualization v a Class 2 Pa G Ga Class 3 Ge KZ giectio It Informa Ga Class 4 Gene Selection Algorithm BSSWSS 2 Test Data Gene Selection Number 50 Gene Selection Results E D BSSWSS_1 Gene Tn Probe ID BSSIWSS Y Class 1 Class 3 Class 4 oof Test Data Class 4 468 A 7 Delete Selected Item 01 E Ep Ob Cancel E zo Ep nia 295985 Ei O l Oe ee 365826 256 MI 1 ee 866702 IM AL i E 21652 2 ALA IA A oe 52076 2055 ED E E LE el 07 810057 BPE Welt E ES 43733 EDPZ PEE MIDI 5 296448 4 P E 3 308163 A l ee 6 244618 Pon Coenen l 769716 66 563673 2555 TEE IA A 27 204545 os FEED E EE eel 27 814526 PII REE Pe 59 486110 oo MA LO Pe 65 212542 133 PP Bi ee 2 207274 1515 CM Cl E A Ue 256 504791 ov E A IEA 51 241412 DP nme MA E II Z 298062 roll El F1 i ne 4 841641 173 MAA E CI 53 812105 if Ll RnR 3 357031 ict 1 En NoE mE lt Figure 5 6 gt Gene Selection Result Window DD Browse Engine and Heat Map Set Up O Search Search with ID or Line No Select search type and input the ID or Line No on the text window and click Search button then the corresponding gene will be shown reversed on the result table Click
61. ceed following Duplication Setting procedure as you can see below 20 mM Duplication Setting Even if the duplicating experiment data exists set it up But in case of no duplicate experiment data existing or in case of Affymetrix GeneChip Data this procedure does not concern so this can be skipped Experimental Information Sample Attributes Duplication Setting Duplication Setting Je 012 301 SS S20 1 212 BS AS 11252 Ctrl LE Shift 1S 0125101 MES Dup HESS NEY USLC GRT 4 8K 031216 03 5 1 1 E 5 GRT 4 8K 031216 22 9 1 1 GRT 4 8K 031216 05 5 2 2 E 6 GRT 4 8K 031216 24 9 2 1 GRT 4 8K 031216 22 9 1 1 y 7 GRT 4 8K 031216 25 9 2 GRT 4 8K 031216 26 9 3 1 8 GRT 4 8K 031216 26 9 3 1 GRT 4 8K 031216 28 11 1 1 2 GRT 4 8K 031216 33 11 3 1 i IGRT 4 8K 031216 33 11 3 lt lt Remove All lt Figure 2 14 gt Duplication Setting Window DD Select duplicate experiment data from the list on the left side of the window use Ctrl or Shift key and click Set Dup gt gt button to duplicate experiment data setting then it will be added in the list on the right side of the window Set up data will be shown in the right side figure in same color To cancel duplicate experiment data use Remove or Remove All button 2 2 2 Filtering Error Spot On the menu bar select Preprocessing Filtering Error Spot or just click on the eighth icon F and this corresponds to One dye Chip D
62. click on the ninth icon anh then the set up window will appear 2 2 3 1 Affymetrix Gene Chip Data 31 Importing Affymetrix Chip Data in 2 1 1 Normalization is proceeded at the same time so this procedure can be skipped But it can be used when you change the Normalization method the new Normalization method will be newly operated overlapping existing Normalization method Normalization Normalization 3 EHNormalization MY Y GOA Ys bl DS Pla Raw Data A ES SHALES 235 2SVUC SHA 0101849 SHA SHO SSSR 21301 het ASA 3731 SSO MAC HO ELC e Global Scale Normalization 8 Scale to all probe sets Scale to selected probe sets Defined scaling factor O Quantile Normalization lt Figure 2 23 gt Affymetrix Gene Chip Data Normalization Setup Window DD Global Scale Normalization This is the method to adjust average signal value of each array according to the option selected as follows 2 2 It can be selected only when Probe Level Analysis Result is not transformed to logs value e Scale to all probe sets e Scale to selected probe sets eo Defined scaling factor 2 Lowess Normalization There is a trend of Lowess Line bending in MA plot region where intensity range 1s low or high Lowess Normalization plays a role to straighten the bended part of the Lowess Line using Local Regression technique 2 3 This can be selected only when Probe level analysis result is transformed into lo
63. cluster in advance like K means or SOM Also in Dendrogram there is a good point that we can fix the size and number of cluster that user desires In other side the weak point of Hierarchical Clustering 1s when once clustered in each step in further step because of remains without going through refinement procedure the tightness of each cluster can be less than K means method So clustering result cannot be satisfied than other method 143 8 2 K means This is the method to find out the cluster of the optimum K unit through repeating calculation procedure It operates the repeating procedure till it reaches to a certain level based on the judgment how much the constituent it means gene in case of gene clustering of each cluster is massed in each central group centroid it means average vector mathematically The strong point of K means is that the resulting clusters are relatively good in clustering together in operating mathematic optimizing through repeating procedure But the user have to input the unit K of cluster in advance and the result can come out differently following the given method of centroid of K unit given initially 144 8 3 Self Organizing Map SOM SOM is the method relatively developed recently in Computer Science field and is used broadly in other fields This is used generally after publication of Tamayo et al and Golub et al of DNA Chip analysis The most strong point of SOM is that we can con
64. column width of each column lt Figure 4 14 gt lt Figure 4 13 gt Select Node Change Color Set Up Window Left and Set Up Result Right Branch cut Value Can divide the cluster inputting Distance Measure that user has input Cutting value Clustering Ein lt Figure 4 15 gt Cutting value Clustering Set Up Window 4d Create Cluster In case of fixing the Cluster that user have input moving the green Moving Bar or input the Branch cut Value Based on this user can create each Cluster and verify the result 42 Save Sub Tree Matrix Can save GEM data of Node selected from Dendrogram in to text file 4 3 2 K means Clustering In K means Clustering 4 2 4 3 select Clustering K means Clustering on the menu bar or click on the ninth icon 3 then set up window appears g2 Clustering Hierarchical KMeans SOM K Means Clustering Select Gene Expression Matrix TEST GEM_3 Objects Number of Cluster e Gene Experiment K 110 Prediction j Max Iteration Distance Measure Max 100 Euclidean distance Initialization Method Cluster Name Pseudo Random Y Default Clustering Cancel lt Figure 4 16 gt K means Clustering Set Up Window Select Gene Expression Matrix Select GEM to apply Clustering Objects Select standard to apply Clustering O Gene Standardize the gen
65. d ebv7d Duplication Mode Sample Name OOO 000 j d ad pul RART O o m m o 0 lt lt k O O b b O m m om lt lt Jy HE 0 on Aaa EBY 7D 14 Array e Separate Spot e Separate lt Figure 2 40 gt DEG Finding GEM Creating Window DD Output Path Set up pathway for the creating Analysis file Analysis Information O Name Input name of the creating Analysis file e Note Input information relating to input data can skip eo GEM Format Basic GEM This is the basic GEM format it includes each Array Signal Intensity Basic GCOS output It corresponds only in case of Affymetrix Gene Chip Data and GEM is created including Signal Detection Detection P value GCOS output Signal Detection It corresponds only in case of Affymetrix Gene Chip Data and GEM is created including Signal Detection Class File Construction 44 For Single Class It corresponds only in case of Two Dye Data create selected data in one class Select an attribute Sample Attribute item set up when inputting data will be shown and will be classified according to the user s choice Click Apply button then selected data only will be shown on the table right side of the window and it is shown in different color for each Class which will be very easy to verify O Selected Attributes Show the selected Attribute list Duplication Mode Not in case of Affymetr
66. d samples 0 PY Ciets d Khor_Habralog_mi_FMES_dalaj Number of incorecthy classified tampias 0 lt Figure 5 13 gt Error Estimation Result Window LOOVC Sroa ker Emimsion Arau imienna Eiai Algor KFiald rompieie F cad Plea 4g Iberation Number 100 Distance Buckidenn Distance Ordr Ciatti Aleit Weed E Mier met Haigh Bey ls if Wisight geting gt agbe Cena Section Algerihia ESTAS GH Erarialdrdlis H Tesi_2 B I Aipirizag liat f Sai Test Dala Chassificstion y weighted HMA Sg Ferir Betimation Remaubs O bobi lias y rada p a i Gites 1 han Heber og pri El dany Total Mumber of thr samples inconecty casted of ined once O Ly Gens Ahen Mtra og_ri_DAS_detaj Total umber of the semples incorrectly classified al least once Y Gees ja Matur llos ri D a Total Number of ihe samples curecthy catta af leal once C Gives d khan Murna ri RMS daar Total Number of the samples incorrect classified al kasi onca i Reration No Mhan_NaturaiLeg ri han Nagwraog ri han MagwralLog ri_ Khan Pabilog mi Average Accuracy 1 AE 100 233 739 100 123 12 100 20420 100 1 hrs are 1005 23 33 100 VE FUE 100 20 30 100 vo 3 BIB 100 2079 10 12412 100 2 FDO 100 Thre are 100 23 33 100 ERLEA 100 20 30 100 Cr E BB 100 33 10e 1203 100 20 520 100 there iF are 1075 23533 100 VE FUE 100 20 30 100 vo 7 ALE 100 23 73 100 123 12 100 204230 100 T a ASS 100 S74
67. e 65 beet _ met 1 Pua Tremangtets_Cless A TreimgData_ Cies T 1 093 ES Y Ceres Bape erections Witrin j ao hdr E i 4 H E veka T Test_1 ATREA 3 AAT 4 Meso 5 st ri Ree AADI H AAT RSH 16 ADASHATO if AAS 13 Higari E RARA 14 fiar 18 E ie ansi a7 amaria 48 sae 13 AAA Adana TE T TESA AMGELEES NEE AMMST PST AARTISOT AMITHA HERO ASAT IG AAO AAD re AADSS ASAE ES rat aS Ads sagaere Soe it it ated 8 es a ae a ss AA Maou 22 21 1151 171 1 506 Dav 01534 1565 A Bed 2 0 508 15654 ed QU 15 TA IEN z410 2 320 1 047 1076 1276 05633 1504 OFF 3500 4407 319 3 43 1540 0 10 10 Ua 1 ES 1050 3500 DU FE 1 St DA 00044 1 E 2148 2040 AMBIT 140 E Wane Ung Ua 1013 OS Ota SUL 200 1 BEE 1158 1124 1 ER 0173 TRES 350 1 3 131 a 1 AES JM UU IIF DO 0D TAH 1113 AZ Le Br 1 1055 0 580 Amt DRET Gej n Saar ato Oars Dayr LHS Dai a117 Dia 1a 4 370 TES Y FT art Daz 054 OT E 1 Be E Gite DH 2370 Daza LEL Qu 1 530 a Oar 013 Are Mean Standard Den Coetticient of TAN 1 00 AMI 09 1 87 0 500 178E Ta Eok 540 gana 50430 gasi 13716 DATO Lay DAH 13 1 430 0006 1 0 380 aT 0441 0137 dl TR i ir 127 6 LHT AU 10 2118 UR Oy O41 0 176 i ee Ub DEN 037 060 DETO 04H Oy 0 505 1 005 48 053 05 0590 DEIT 0 460 UA
68. e e Experiment Standardize the Sample Distance Measure Select the method used for the calculation of Clustering distance Ref 8 1 e Euclidean Distance Geometrical distance between two individuals e Manhattan Distance Z Distance between two individuals considering specific gravity that each variable occupies e Pearson Correlation centered Measure the similarity grade of two individuals using the correlation coefficient after transforming each individual s average O and diversity 1 e Pearson Correlation uncentered Measure the similarity grade of two individuals using actual signal value calculated correlation coefficient of two individuals e Absolute Pearson Use the absolute value of Pearson correlation coefficient Initialization Method Select the method of initialization O Pseudo Random Generate similar random number every repetition e Totally Random Generate random number optionally every repetition Number of Cluster Input the number of Cluster Click Prediction button to search for the most suitable K value first then continue the operation Ref 4 4 2 Max Iteration Input maximum repetition frequency Basic value 100 93 Input the name of the Clustering Result then click Clustering button to verify the Clustering Result 4 3 3 Self Organizing Map In Self Organizing Map SOM 4 4 select Clustering Self Organizing Map on the menu bar or click on the tenth icon E then the se
69. ed 666 ID Type Select ID Type of the input data e Commercial Product Probe ID Agilent Probe ID Two dye CodeLink Probe ID lumina Probe ID Operon Probe ID e Public Database ID IMAGE Clone ID NCBI Clone ID NCBI GenBank Accession NCBI Gene ID LocusLink NCBI UniGene ID e Others If ID unknown Species Select type of the data input Provide different species according to each ID Type C elegans Human Mouse Rat Others Uf no species corresponding e All If species unknown browse all kind when using annotation function File Format Select file format for the input data e GenePix Results format gpr 22 e ImaGene Data Click Next gt button then Analysis File will be created and the window appears to select data as seen in the figure below Import Two Dye Data q10141S 2222801 LIE 03 YLICH AA Ada HES OSs 24 HAO SS YS MENE o Finish HES FES 18 SWS ANG S AAE SSC sample preprocessingW04 2 11G LI M1 txt Asampletpreprocessing04 2 116 Ll M2 1 txt sample preprocessingW04 2 11G LI M2 txt AsampleipreprocessingiD4 2 116 LI M3 txt Asampleipreprocessingi04 2 5G Ll M1 txt Asampleipreprocessing04 2 56 Ll M2 txt Asampleipreprocessing04 2 56 Ll M3 1 txt sample preprocessing 04 2 5G LI M3 txt AsampleipreprocessingW04 2 7G LI M1 txt Asampletpreprocessingi04 2 7G Ll M2 txt sample preprocessingW04 2 7G LI M3 1 txt AsampleipreprocessingiD4 2 7G Ll M3 txt samplep
70. essi Rattus norve Rn 40174 1 Rn 40174 2004 B ral guanine n chr3p12 ENSRNOGOO Q03366 1367857_at 84575 Fads Rat Expressi Rattus norve Rn 28161 1 Rn 28161 2004 B fatty acid de chr1q43 ENSRNOGOO 920R3 1367871_at 25086 Cyp2e1 Rat Expressi Rattus norve Rn 1372 1 Rn 1372 2004 B cytochrome ENSRNOGOO P05182 W Q 1367896_at 54232 Ca3 Rat Expressi Rattus norve Rn 1647 1 Rn 1647 2004 B carbonic anh chr2q23 ENSRNOGOO P14141 1367930_at 29423 Gap43 Rat Expressi Rattus norve Rn 10928 1 Rn 10928 2004 B growth asso chr11q21 PO 936 1367954 at 25454 Gfral Rat Expressi Rattus norve Rn 6281 1 Rn 88489 2004 B glial cell line chr1q55 ENSRNOGOO 035748 W Q 1367961 _at 24594 Ngfg Rat Expressi Rattus norve Rn 11331 1 Rn 11331 2004 B nerve growt chr1q22 ENSRNOGOO POD758 1367970_at 81531 Pfn2 Rat Expressi Rattus norve Rn 3515 1 Rn 203100 2004 B profilin 2 chr2q31 ENSRNOGOO QSEPCE 1367973_at 24770 Ccl2 Rat Expressi Rattus norve Rn 4772 1 Rn 4772 2004 B chemokine chr10q26 ENSRNOGOO P14644 1367986_at 29602 Ptgfrn Rat Expressi Rattus norve Rn 6332 1 Rn 6332 2004 B prostaglandi chr2q34 ENSRNOGOO 062786 1367992_at 25719 Sgne1 Rat Expressi Rattus norve Rn 6173 1 Rn 6173 2004 B secretory gr chr3q34 ENSRNOGOO P27682 1368112 at 25539 Sag Rat Expressi Rattus norve Rn 9856 1 Rn 9856 2004 B
71. est on the menu bar then the Parameter set up window appears DD Parameter Setting eo Significance Level Select significant genes below level that user has set up e Number of Genes Select genes in higher scored order that user have set up e Class specific Number of Genes Select genes in higher score order of class O Statistical Significance Computation Select the method to seek for P value Asymptotic Distribution In case of assuming the data ratio distribution as the regular distribution Permutation Test In case of no assumption of data ratio distribution 69 eo Multiple Test Correlation Select the method to revise P value None Bonferroni Holm s procedure Benjamini Hochberg FDR Matching Pairs O Select Reference class and Target class and then set up the sample Pair e In Given Order gt gt Set up the pair with given order o Set Pairs gt gt Set up the sample pair which user have selected eo lt lt Remove lt lt Remove All Set up or cancel pair 3 Click OK button to see the result 3 3 2 2 Wilcoxon Signed Rank Test In Wilcoxon Signed Rank Test 3 1 select DEG finding 2 Class Paired Test Wilcoxon Signed Rank Test on the menu bar then set up window appears Parameter is identical with Paired T Test Ref 3 3 2 1 3 3 3 2 Class Unpaired Test Algorithm amp Parameter Setting Kruskal Wallis H Test Student T Test Welch s T Test Z Test Mann Whit
72. et up the parameter of Classification Algorithm that the user has selected but if not set up basic value will be used In case of Classification Algorithm and selected Multi FLDA is selected corresponding menu will inactivate because there is no parameter On the menu bar select Classification Set Parameter s or just click on the twelfth icon FA mE Weighted K Nearest Neighbor KNN Select whether using K value and weight or not Basic value K 5 weighted Set Weighted KNN Parameter K Setting K V alue Max total sample number Weight Setting No Weight Default weighted EN lt Figure 5 10 gt Weighted KNN Parameter Input Window E Prototype Matching with indeterminacy parameters If the calculation result is under designated C value it is determined as indeterminate Basic value C 0 1 Set Prototype matching Parameter Prior probability Setting Prior probability prior knowledge of class ratio Threshold Setting _ C value Threshold Rome lt Figure 5 11 gt Prototype Matching Parameter Input Window 5 4 4 Classify Test Data On the menu bar select Classification Classify Test Data or click on the thirteenth icon to verify the Classification result 120 A Anaiysis E E wtetgntes kab AAT ESET Ea UE mis Chsaieaiion Result indoamatinn T oo Distance Euclides Distance Ordinary Clstaitication Algeriin Weighted HaNeareet espribos E
73. f FOM according the K value from the left side table 2 It shows the result in graph format on the right side range 104 4 9 Reference 4 1 M B Eisen P T Spellman P O Brown and D Botstein 1998 Cluster analysis and display of genome wide expression patterns Proc Natl Acad Sci USA 95 14863 14868 4 2 J A Hartigan amp M A Wong 1979 A k means clustering algorithm Appl Statist 28 100 108 4 3 S Tavazoie et al 1999 Systematic determination of genetic network architecture Nat Genet 22 281 285 4 4 P Tamayo et al 1999 Interpreting patterns of gene expression with SOMs methods and application to hematopoietic differentiation Proc Natl Acad Sci USA 96 2907 29 LA 4 5 F Azuaje 2002 A cluster validity framework for genome expression data Bioinformatics 18 319 320 4 6 K Y Yeung et al 2001 Validating clustering for gene expression data Bioinformatics 17 309 318 105 Classification 9 Classification o l File 9 1 1 New Analysis In case of exporting data from Preprocessing module Analysis file is automatically created so this procedure can be skipped Go to 5 2 On the menu bar select File New Analysis or click on the first icon Ek then the Analysis creating window appears New Analysis Analysis Name Untitled_1 Directory F wyorkiGenPlex 2 3iybprojectiMy Analysis Classification SAMPLE DATA Description Pr
74. fteenth icon W Ed O Cirean maser Prol e HA ET o or Y GRT ak oy Y GATA Y GAT a gk CIA A LY ORT a ae OI picadas Y GAT aak IIA Y GAT aak giiia Di GAT a gk A picada D GAT aak AA LY DATA BH OI L GAT 4 BH 0121 21 3 GAT 4 Bi OH FEES 01 31 chapel 3 GAT 4 BH OIEA El a Hermadized Data 3 GAT 48H OIE 1 1 J GAT 458 OF Stee 2 25 J GAT 458 OF Ie 1 i GAT 48 GH 216 0715 34 dupina D GAT 4 58 004 216 20094 1 D GAT 45H 004 ea 2 1 ORT AK OH HEAS upia Oy GAT aon OF HEINS O GAT aak 0 4 19 HO GAT ak On 1 3 1 HO GAT a Oe 0 63001124 duplica 0 GAT ak 0116 0011 3 13 lt Figure 2 39 gt Correlation Matrix Plot Result Window D Can confirm data selected in Correlation Matrix Plot 43 2 4 Analysis Data This is the procedure creating GEM to analyze preprocessed data and possible to export the created GEM in DEG Finding Clustering Classification module 2 4 1 DEG Finding This is the DEG Finding module where user can find DEG Differentially Expressed Gene can export data On the menu bar select Analysis Data DEG finding or click on the sixteenth icon E to export input data to DEG Finding module Generate Gene Expression Matrix Analysis Data DEG Finding Clustering Classification Output Path F isample DEGFinding Analysis Infomation Name samplet GEM Format Basic GCOS output v Class File Construction For Single Class Selected Attributes cdid cd d ebvi
75. g Select Gene Expression Matrix TESI GEM_1 Objects r Linkage Gene Experiment O Single Linkage Complete Linkage r Distance Measure 7 Average Linkage Pearson correlation centered o Ward s Method lt Figure 4 10 gt Hierarchical Clustering Set Up Window D Select Gene Expression Matrix Select GEM when apply Clustering 2 Objects Select standard to apply Clustering O Gene Standard on gene Oo Experiment Standard on Sample 3 Distance Measure This is used to calculate the distance between two individuals Ref 8 1 e Euclidean Distance Geometrical distance between two individuals Oo Manhattan Distance Distance between two individuals considering the importance of each variables Oo Pearson Correlation centered Measure similarity of two individuals using coefficient of correlation after transforming the average of each individuals in O and decentralization to 1 88 Pearson Correlation uncentered Measure similarity of two individuals using calculated correlation coefficient with actual signal value of two individuals Absolute Pearson Use the absolute value of Pearson correlation coefficient 4 Linkage This is the method to calculate the distance between Clusters Average Linkage This is the method to adjust in to similarity of entire Clusters after having the outcome of the similarity average between all individuals in two Clusters composing a
76. gs value Data Fraction Possible to set up the data ratio used for the calculation Iteration No Possible to set up repetition frequency Reference Array Pseudo Median valued array Reference Array Pseudo Mean valued array O Selection of Reference Array Possible to set up Reference Array Quantile Normalization This is the method to adjust identically all array distributions 2 4 4 Click Start button then Normalization will be activated 32 E Normalization Result A Aa nia LS heal dd m a Cfi Data EL paipa Dein i Sanne Hor Normalization Parameter Global Scale Monnealization prole ti Seve Gene Expression Matte As Teod a C rtel O Oi muta DI Eb 10 08 D ERY 10 10 C Gove DY EBRO Y EE Wd LI diy 7o 14 Prota E A y i Based at 3 AFF XSi a a AFF XSi E a 5 JAFFAEioC 3 a A JAFFAEoDn 5 m 7 AFF Sistn 3_ m a AFFEDEN at a AFFR Cree 3 at 10 AFF XD 5 m 11 AF FKD ni 12 AFF oD 3 Ll 13 AFF RAL a a 14 AFFE axl a 15 AFFE yau 3_ m 15 AFFE PRENS m if AFF A Phill al 18 AFFE PRENI m 14 AFFE ThE 5 mE a0 AFF TREM m HO AFF X Thrk 3_nt 73 AFFE Tpi 5 B a APF ACT rp ff 3 AFFX Trpeit 3_ot P AFFE I Ep BisE5 al 5 JAFFA Ec bioBM mt ar AF FA Fc bishi3 al A AFF R42 Ec biod 5 al Fel LAF Fa r2 Ec biol 1 it 1 AF FA 42 Fr biol al 7 AFF A 42 Fe biol 3 al az AFF Rol A 3a AFFE al 4 AFFAFI Bodas m i 20 aH Ss EE
77. gure 3 24 gt Venn Diagram Result Window D Select 2 or 3 DEG Finding Results from the table on the right side of the window and click Apply button then the result will be shown on the Venn Diagram on the left side Click button on the result below the table corresponding gene list will be added in the tree on the left side as DEG Finding result 3 4 7 Volcano Plot In Volcano Plot 3 4 select Statistics Plot Volcano Plot on the menu bar or just click on the twelfth icon a 74 Yolcano Plot X axis Fold Change 8 Mean of all combinations Reference Class Target Reference Class_eby All Comb gt gt Target Class lt lt Remove lt lt Remove All Y ais LLL Statistical Test Student T Test unpaired Multiple Test Correction lt Figure 3 25 gt Volcano Plot Set Up Window T Set up Reference and Target Class to add in the list on the right side of the window select Statistical Test then click OK button to see the Volcano Plot Result Window bey2_t mg P E Gene Expression Matrix Reference Cless gt 3 DEG Finding Class_cd 700 BHO Cem Wey ANOVAI Terget Cless 4 CB Wilcoxon Signed Rank Class et a Welch s T Teat_7 Statistical Test w Ga Z T 8 Studert T Test p Ga Mennvihtrey Test Multiple Test Correctionc 2525 SO irusial vals H Test_ 2 E 3 Fold Change One Oye Fold Change FC Threshold 3 SHE Volcano Piot Welch s
78. ication Setting Sample Attributes 298 0129 aM type time dose etc VS SLICt Sov 2301 CE AS 2 SIBU wea Y Down HES 0125101 VSS YA Y gt USucins YE 122 ASE FF SAH 913 0 2 12 OSH L HFAA 8382 2201 ASU 332 Add Remove HES 0125 01 IAS USLC Sample Name Attribute Type Time Dose Var Type categorical categorical categorical categorical categorical cati GRT 4 8K 031216 03 5 1 1 normal Ohr low GRT 4 8K 031216 05 5 2 2 normal 6hr GRT 4 8K 031216 06 5 3 1 normal 12hr GRT 4 8K 031216 07 5 3 1 d normal 12hr Clear GRT 4 8K 031216 22 9 1 1 normal Ohr GRT 4 8K 031216 24 9 2 1 normal 6hr Copy GRT 4 8K 031216 25 9 2 1 d normal 6hr GRT 4 8K 031216 26 9 3 1 tumor 12hr Paste 9 GRT 4 8K 031216 28 11 1 1 tumor Ohr 10 GRT 4 8K 031216 29 11 2 1 tumor 6hr 11 GRT 4 8K 031216 31 11 2 1 tumor 6hr 12 GRT 4 8K 031216 33 11 3 1 tumor 12hr ODIN DM loo lt Figure 2 13 gt Sample Attributes Input Window D Attribute Name Input Attribute name Type Time Dose are input as basic value To adjust use Add and Remove button to add or delete attributes Var Type Select either Categorical or Continuous as attribute type Currently supporting only Categorical type Double click each cell to input attributes and use Fill Down Copy Paste button for easier input 4 If there is no duplicate experiment data click OK button to complete But if duplicate experiment data exists pro
79. image as Save 3D coordinates value as Sample identifier Class 3 NB C6 lt Figure 5 7 gt Visualization Result Window L15 D It shows set up of Gene Selection Algorithm and Parameter that user have selected It shows the result of Gene Selection Oo Line No It shows the rank of input data genes Click hyperlink to see the gene profile graph Gene profile graph shows the average of study data in dotted line and shows corresponding gene significant pattern in bended line graph Line Ho 3 77034 oc El Burin vie T Class lt Figure 5 8 gt Gene Profile Result Window o JD It shows the ID of each gene Click hyperlink to verify the detailed information of corresponding gene in connecting related database URL To have exact information exact ID Type should be selected when Analysis is created When ID Type is selected as Other or All then it will not be connected O Score It is the calculation result value of each gene This value shows the variation progress in the result graph so it can be used to predict visually the gene which shows the variation of significant value Oo It shows the Heat Map of the extracted significant genes User can easily verify with eyes the significant information of each gene 5 3 4 Combine Results On the menu bar select Gene Selection Combine Results or click on the tenth 116 he F icon to associate several Gene Selection Resu
80. ime had become the main tool in biotechnology research field in the 21st Century The use of Microarray allows us to verify the significant status on the gene level of the gene inside the cell and through this significant information we can understand inclusively the relation between these genes However because of the complicatedness of the out coming data Microarray requires recent method of all kinds of algorithm and bio informatics such as Mathematics Statistics and Computer Science etc GenPlex is a Microarray analyzing software which offers useful information to scientists in analyzing the data suitably to the users in providing various visualization of the results and also possible to analyze the experiment data through various statistical algorithm 1 2 Main Function 1 2 1 Easy Data Importing It is easy to input data for users because it recognizes automatically various types of raw data E Supporting Format Affymetrix Gene Chip Data CEL ABI Chip Data Illumina Chip Data BeadStudio output GenePix Result ImaGene Data 1 2 2 Preprocessing It provides the information of the raw data quality through statistical figure and various plot also provides the Preprocessing function of the data which will be used for future analysis Box Plot Histogram MA Plot QQ Plot Sample Correlation Scatter Matrix Plot showing the relationship between replications Global centering scaling Global Print
81. is Analysis Name amp Location Name Untitled_1 Location E trialiMy AnalysisiPathway Analysis Information Description ID Type Species Select ID Type 7 Select Species Create lt Figure 6 1 gt Analysis Creating Window D Name Input the name of created Analysis File 2 Location Click button to select the position where Analysis File will be created 3 Description Input added information in to the Analysis File Gt can be skipped 4 ID Type Select the ID Type of the inputting data e Commercial Product Probe ID Affymetrix GeneChip Probe ID Agilent Probe ID One dye Agilent Probe ID Two dye 129 Applied Biosystems 1700 Probe ID CodeLink Probe ID llumina Probe ID Operon Probe ID eo Public DataBase ID IMAGE Clone ID NCBI Clone ID NCBI GenBank Accession NCBI GeneID LocusLink NCBI UniGene ID e Others If ID not known 5 Species Select the species of input data Click Create button then the Analysis File will be created and the data selecting window appears Go to 6 1 7 6 1 2 Open Analysis On the menu bar select File Open Analysis or click on the second icon Y to open the saved Analysis File 6 1 3 Recent Analysis On the menu bar select File Recent Analysis to open recently analyzed Analysis File This list can be deleted using Clear History menu 6 1 4 Save An
82. ix Gene Chip Data Array Select Mean Merge in case of analyzing average value of the duplication experiment data If the user does not set up duplication experiment data Ref 2 2 1 Mean Merge will inactivate Spot Select Mean Merge in case of analyzing average value of the same ID Spot within the Array It shows the data only corresponding to the selected Sample Attribute Click OK button the only selected data will be exported and automatically DEG Finding module will activate 2 4 2 Clustering Export input data in to Clustering module which is possible for Clustering analysis On the menu bar select Analysis Data Clustering or click on the seventeenth icon ca then it will export the input data to Clustering module Generate Gene Expression Matrix amp Analysis Data DEG Finding Clustering Classification Output Path C Program FilesistechiGenPlex 3 0 My AnalysisiDEG Finding 2 2 Analysis Infomation Ht Name Untitled_1 Note normal GEM Format Basic GEM w normal Sample Selection cancer O Sample Name normal normal normal normal normal normal E SJ 8 fs ZzZzZ 2 2 2 8 1 1 1 T n e Pele KIKI Select an attribute Selected Attributes z cancer cancer cancer normal cancer cancer cancer cancer 16 v Cancer cancer Duplication Mode Array 0 Separate 4 lt Figure 2 41 gt Clustering GEM Creating Window 4d
83. k on the mouse in Dendrogram to see the pop up menu 2 Heatmap Color Scale around row average Control the color of Up Down based on each average of gene Heatmap Color Yellow Blue up down Can change the color of Heat Map in to Yellow and Blue Heatmap Color Brightness Scale Can control the brightness of Heat Map Dendrogram Shape Sample Tree It only shows the Sample Tree of the Dendrogram 90 a LO OO TOO ON El lt LLL lt TIENEN ENON EN COLO MIMIC Ct fF F OOOO et e TTT bovooooooo TTT TELL OOOO COOOOOOOOOO lt Figure 4 12 gt Dendrogram Shape Sample Tree Result Window 5 Dendrogram Branch Coloring After selecting each Node from Dendrogram click this menu then it 1s possible to fix the name and color of Node lt Figure 4 13 gt Select Node Change Color Set Up Window Left and Set Up Result Right Reset Branch Coloring Can reset the color of Node Dendrogram Color scale bar Can see the color of scale bar 9 Retrieve Annotation Data Can verify the Annotation Information of genes corresponding with the selected Node 9 Heatmap Annotation Can see in one view the Annotation Information on the right side of the Dendrogram But not lumina Probe ID 91 Heatlmap Annotation ORINA ASS 2 RBA SSS Heatmapdl 4713101 SO FUC 1st column 2nd column 3rd column D v Gene Symbol v Gene Title Each column size should be limited Virite the maximum
84. kage Centroid Linkage Average to Centroids Hausdorff All Linkage Click Add button on the list of left side of the window to select comparing Clustering Result Distance Measure Select the method of calculation of the distance between Clusters Ref 8 1 e Euclidean Distance Geometrical distance between two individuals 101 o Manhattan Distance Distance between two individuals considering the gravity occupied in each variable e Pearson Correlation centered Measures the similarity grade of two individuals using the correlation coefficient after transforming each individual s average O and diversity 1 e Pearson Correlation uncentered Measures the similarity grade of two individuals using actual signal value calculated correlation coefficient of two individuals e Absolute Pearson Using absolute value of Pearson correlation coefficient 5 GDI Input the name of the GDI result and click Validation button to verify the result mE GDI 45 rp enn Tran mem ii re Result Information Method GL Chsleria Fears Dais 7 Keen Daima p Keer Delad S KMesr_Delad_4 Mes Dad a Hhierara_Defrl_ Hear Coeff Cluster Validation ata 1 1 fare af Prii ipah E Tuan d k adong Po Kea Cluster Hame WH Te Dunn s miai lt Figure 4 25 gt GDI Result Window I It is possible to verify the GDI detailed result from the left side table and shows the name of the best rest on the bottom of
85. l 10 712 10 738 10 908 11 sag 11 914 11 742 6 200 95 at qa 13050 12 575 3043 10 064 11 265 a MAR Aa 10 39 4 4 4 10 347 a ama ARAS 11 JAAN 5 a Tb 1397 U SA War g I ii AN AS _5_ad a a Gan TRASS SAS 8 aih AS 201251 al 11 269 11 648 11 548 12 743 12 445 13 102 13 AH 1 oe a Si 1 061 0054 BL 10176 10313 i4 201 3 el 10058 10305 10315 a 43 i Sri TASA 15 anar _ st 102 10 673 10 710 0004 araz 9356 18 2055615 ad 1051 10477 1154 1275 12471 133 17 201603 al 10 572 10 848 9 979 85 2 908 8 064 10 20 Gad t Wa 11520 11 3 12318 12547 12972 15 ARES Aat 1035 10580 104 11515 11 437 11135 pan AH WA A ad 12120 Wa 12360 PRESA 11 H2 13177 E Ani s ab 5 it RAT ATA BET TR 7d E 201849_al 12 293 12 034 11 803 10 522 11 184 10 781 ra SUA WSU a 10 00 11 643 10 St JEA 331 gari a AN BAG E a 14535 15135 18737 13116 13 549 13154 3 aA a a 33 Ere 10 135 Adi 5380 714 20052 a 10 001 10 992 11 313 Iys 100752 8 819 27 HATA Al 11 BAR 11 ATA 1105 2 oe 160413 10171 a SULA 3 at rna B131 G BH tthe EA e QA a 11 75 12 M 11733 10 560 10735 10054 NETA 10 976 11 132 10 371 3 867 10 321 9 189 H uma at 655 05913 9 570 10 334 10749 11 505 Er HINI _al 17173 17 01h 13435 13337 13 045 115341 a SUL _ 5 A 1240 VAT Vie ca 3164 aor q 3 AAA sd Sonda 2102 ARA 4465 65h TAH las 200401 sa TB 7 890 7214 8647 SATE 9 188 w SUS 3 at 10 55 Pal ERA II Wa ina a7 200438_s_l 11 243 1 465 10 575 240 074 THH m ZUM Sd 11
86. latest analyzed Analysis File This list can be deleted if you click Clear History menu 2 1 6 Save Analysis On the menu bar select File Save Analysis or click on the fifth icon B then you can save currently working Analysis File 2 1 7 Save Analysis As On the menu bar select File Save Analysis As or click on the sixth icon imh then you can save Analysis file in different name 2 1 8 Close Analysis On the menu bar select File Close Analysis then you can close Analysis File 2 1 9 Analysis Properties On the menu bar select File Analysis Properties then you can change the attribute of currently working Analysis File 2 1 10 Configure On the menu bar select File Configure then you can adjust set screen Pos Ls Exit On the menu bar select File Exit then you can close the program 24 2 2 Preprocessing 2 2 1 Experimental Information This is the input process of inputting the experiment information of the data On the menu bar select Preprocessing Experimental Information or click on the seventh icon al NW Sample Attributes Input Sample Attributes needed for further analysis User can classify in different attributes or select data attributes needed only according to the value that user has input If no attributes are input then it may not progress to the next analysis step Experimental Information Sample Attributes Dupl
87. ld change is more an experimental method than the statistical method 136 7 2 Two sample unpaired t test This method is broadly used together with Fold Change in obtaining DEG but it is contrary to the Fold Change because it gives statistical significance It is true that we can find out the linkage with Fold Change when we carefully see the T Test modulation But essentially T Test which represents Fold Change is the difference of average significance between analyzed groups this corresponds to molecule of modulation divided by the variance among the groups Therefore absolute value of T Score become bigger when the difference among each group is smaller and also the difference of average significance between two groups are bigger Bigger the absolute value of T Score the statistical significant will be more guaranteed This statistical significance is known through P Value we can divide in ways of obtaining P Value following the assumption of the data The Welch approximation method is used in case of assuming that the data is following regular distribution and generally permutation test 1s used when no other ratio distribution is assumed But we have to keep in mind that generally to obtain the best result of T Test it needs to apply at least 5 6 or more replications among each group gt Taa T approximately t distributed with d o f v S n S3 n 2 2 2 ae S n S n S n
88. les into training and test subsets random splits of the 62 colon tissue samples into training and test subsets of 3 of 38 and 34 samples respectively TE test error 31 samples each TE test error 149
89. lis H Test Method to compare more than three cluster distribution Regularized t test User defined User can directly define significant genes User Defined Gene Selection D x Case sensitive Line No ID 0 Y On fF whe OOOO 24128526 44131693 246758065 N69107 158092 R39662 AA233738 AA284693 171887 R39463 4 458472 AA455043 H18971 T98612 R465816 N93425 lt Figure 9 5 gt User Defined Gene Selection Communication Window 5 3 2 Set Parameter s OK Cancel It is possible to set up the parameter number of genes or p value of Gene Selection Algorithm that user have selected if not set up basic value will be used 113 If Null or User defined is selected as Gene Selection Algorithm then this menu will inactivate because there is no parameter On the menu bar select Gene Selection Set Parameter s or just click on the eighth icon Bal 5 3 39 Run On the menu bar select Gene Selection Run or click on the ninth icon Blo see the Gene Selected Result which the user have selected and set up an Algorithm and Parameter The last operation result will be fixed to basic Gene Selection Result and will be used to Test data distinction BSSWSS_1 Probe ID d Case sensitive o Kahn mi
90. lso confirm 62 the reliability of the Differentially Expressed Genes Same groups are expressed in identical color Click each ball to see the name of the sample on the Sample identifier PCA visualization Name Sample identifier C_3h C_3h_1_Signal lt Figure 3 14 gt Sample PCA Result Window DEG Filtering o Minimum Signal Intensity In the select Gene list the user can confirm the result with genes deleted which include smaller Signal value than the user have set up e Detection Call In case of One Dye Chip Data it 1s possible to confirm the result after applying Filtering based on Detection PMA Call of selected genes Remove the Probe IDs with Present Calls for less than I arrays Genes that Present Call is less than the number of Array in U will be deleted Remove the genes with A or No Call grade at all arrays Delete all genes which Detection is all A or No call in Array Select the genes with P grade at all arrays You can confirm the result with genes which the average difference between signals of two classes is deleted from the selected gene list e Difference Between Averages 2 class only You can confirm the result with genes which the average difference between signals of two classes is deleted from the selected gene list Browse tool and Heat Map Set Up O Search Search with ID or Line No Select Search type and input the ID or 63 Line No on the text wind
91. lt using AND OR operation If there is already a Marker gene or if it is needed to use certain genes which the biological information are known for useful distinction use User Defined to select directly corresponding genes and possible to use in associating with existing Gene Selection result gt Combine Gene Selection Result s Operation AND OR Gene Selection Result Name BSSWYSS_1 Kruskal YVallis H Test_2 Regularized t test_3 Null_4 E Y v Y Cancel lt Figure 5 9 gt Combined Gene Selection Communication Window 5 3 5 Set As Active Gene Selection On the menu bar select Gene Selection Set As Active Gene Selection then it will be activated to Gene Selection Result used to Test data distinction 9 3 6 Export to Clustering On the menu bar select Gene Selection Export to Clustering Module then it is possible to export several Gene Selection results in to Clustering module 5 3 7 Export to Pathway Analysis Module On the menu bar select Gene Selection Export to Pathway Analysis Module then it is possible to export Gene Selection result in to Pathway Analysis module 117 5 3 8 Save Result s On the menu bar select Gene Selection Save Result s to save selected Gene Selection result in to text file 118 9 4 Classification It 1s possible to distinct the Te
92. lts format gpr E Two Dye Chip Data go to 2 1 3 gt eo GenePix Result O ImaGene Data 2 1 1 Import Affymetrix Gene Chip Data On the menu bar click File Import Affymetrix Gene Chip Data or click the first icon Bh then Analysis information input window appears GenPlex is able to analyze two kinds of data which are 3 IVT Expression Chip and Gene ST array Among preprocessing procedure of 3 Expression array and Gene ST array Step 1 and Step 2 are identical but only Step 3 1s different from other two Steps LS D Step 1 Import Affymetrix Gene Chip Data Step 1 Analysis Name Untitled_1 Directory D GenPlex_Retail_TestiMy A4nalysisiPreprocessing pur Description lt Figure 2 1 gt Step 1 Analysis information input window Analysis Name Input the file name of creating Analysis file Directory Click button to select the location of a new creating Analysis file Description Input the information on the Analysis can be omitted Click Next gt button then the Analysis will be created and the data selecting window which is Step 2 will appear 2 Step 2 Import Affymetrix Gene Chip Data Step 2 This is the process to input data into the program Click Add button and select the file and click Finish to load data Add Remove Remove All E Lel La Lel La La Lx Library CDF PGF CLF PSI Path D GenPlex_Retail_Testicdf p Library Download lt Figure 2 2 gt
93. lysis 1s to obtain the accuracy with the classified genes and classifier Especially because of the classification analysis practical field is the medical field like diagnosis prognosis and prediction so calculation Error Estimation is very important When we inspect the methods reported as high Error Estimation of DNA Chip data in certain thesis generally with other similar characteristically individual data there are few cases that show the lower Error Estimation than reported figure In case of DNA Chip because there are only a few numbers of samples it is not easy to obtain the reliable Error Estimation with these samples Thus adequate Error Estimation method is required in this circumstance The graph below shows that if the adequate Error Estimation method is not applied the accuracy can be pumped up Ambroise et al has mentioned the difference between external validation and internal validation of Leave One Out Cross Validation LOOCV which is generally applied and to complement this compared the method like Bootstrap 10 fold CV see the graph below 0 3 0 3 0 25 0 25r 0 2 0 2 2 E 0 151 0 15 e U g g R a 0 1 0 1 i 0 05 0 05 0 o 2 4 6 8 0 2 4 6 8 10 12 log2 number of genes log2 number of genes Fig 1 Error rates of the SVM rule with RFE procedure averaged over 50 Fig 2 Error rates of the SVM rule with RFE procedure averaged over 50 random splits of the 72 leukemia tissue samp
94. m ma DO Ge T D Gere i a TT com a Re 0604 Ragga E EE 1 a a 4 B Ll T E a Li 11 1 13 14 EBV 10 41 sFi arad Data e Momi lizad Data lt Figure 2 37 gt Correlation Scatter Plot Result Window 1 42 Dj GATA Ek Oo est OY GAT 8 OH ii Y GATA Ones 1 GATA BH OH ME O7 5 3 1 lupe mO GATA AAA LY GATA Sk CIA C GAT 4 86 01 206 2509 241 duplica LY GATA ceo pani DY DAT 4 BH OEI T i Py CRTA SPE ont sa crrat6 08 LY GATA AAA dupes Dj ORT 45h omnei Ga Normalized Dats Y GT aak 0 216 05 1 11 Dj GAT4 8H OH 6051522 mL GATA AAA Y ORTA BR COTES 3 tupinti Dj GAT 48k O11 i GAT e oa epee D ORT ASK OH HEIA duplicada C GAT 48k 0h Heta won GRT ue OA AAN Dj ORT 4 Ek OEA O GAT 48k HHGH dupica LY GAT aak IIA 1 H CRT ON HEDE k ma Pa thy Poe sa 1 Y x z ee kel oe a i a Js ie ORT a ee ONE 0669 s e es 2 E E a 2 3 i r ORTA IOMA e er 3 E i lt Figure 2 38 gt Correlation Scatter Plot Result Window 2 GQ Can confirm the Correlation Scatter Plot 2 Possible to show before and after Normalization in a same or separated window according to set up and also possible to show each Plot in one Result Window 2 3 Correlation Matrix Plot On the menu bar select Statistics Plot Correlation Matrix Plot or just click on the fi
95. mapping genes onto pathways m Up Down regulation display with heatmap 11 1 2 7 Biological Annotation amp Data Mining For gene group from the statistical analysis result of DEG Finding Clustering etc we can pull out all kinds of biologic information like GO Annotation KEGG Pathway etc And also can analyze statistically the biological linkage of each group using Gene Ontology E Basic Information NCBI Gene ID UniGene ID Gene Symbol Gene Title Chromosome Location Protein Information InterPro Pfam Prosite EC Number Uniprot PANTHER Category PANTHER Family Name PANTHER Subfamily Name PANTHER Function PANTHER Process Gene Ontology GO Molecular Function GO Biological Process GO Cellular Component Pathway KEGG Pathway ID Conversion Public ID 12 Recommended Computer Requirements Microsoft Windows 2000 XP System CPU Pentium 4 higher than 2 4GHz RAM minimum 1GB 13 Preprocessing 2 Preprocessing You can have the image file of Microarray experiment results and it is the process of preprocessing of the raw data issued from image scanning 2 1 File First input raw data using Import Data Menu Import Data Menu is classified in three kinds and is Supporting data format as follows m Affymetrix Gene Chip Data go to 2 1 1 gt e CEL File e CHP File E One Dye Chip Data go to 2 1 2 gt eo ABI Chip Data e llumina Chip Data BeadStudio Exported Gene Probe Profile Data e Agilent Chip Data GenePix Resu
96. menu bar select File Transpose Data to transform the Row and the Column of the input data 5 1 10 Analysis Properties On the menu bar select File Analysis Properties to adjust the information of working Analysis File 5 1 11 Exit On the menu bar select File Exit to complete the program D 1 11 Exit On the menu bar select File Exit then you can finish the program 110 9 2 Preprocessing In the Preprocessing menu it 1s possible to verify whether input data format matches or delete Missing Data Then to continue the analysis input data format has to match and no Missing Data 5 2 1 Check amp Match Data On the menu bar select Preprocessing Check amp Match Data to verify whether input data gene figure and ID is identical In case each input data gene figure or ID does not match it is possible to match based on gene ID 5 2 2 Filter Missing Data On the menu bar select Preprocessing Filter Missing Data then the window as seen below will appear and it is possible to delete missing entry from input data Filter Missing Data C Line No ID Total normal germ tumor ger mu 1 44128826 1 2 44056536 60 3 44131693 4N47111 34 5 44578065 6 44099140 51 8 1458092 10 44017132 11 44233738 13171887 15 44458472 16 H02243 18 H18971 19 198612 20 R46816 21 N93428 22 R88247 23 R01733 24 AA455997 uw ho m 10 11 12 13 14 15 16 nis 1
97. n Iteration No Possible to set up repetition frequency Reference Array Pseudo Median valued array Reference Array Pseudo Mean valued array Selection of Reference Array Possible to set up Reference Array 3 Quantile Normalization This is the method to control array distribution equally 2 4 4 Click Next gt button to go to next step NW Signal to Noise Filtering Possible to exclude the spot that has smaller value than the user have set up for the Signal to Noise value This procedure is applied after Normalization is over S N Signal to Noise Filtering Signal to Noise Filtering Signal to Noise SM S 4230 AS spotS AABLICH Filter the spots with SN BEE Apply C cn lt Figure 2 26 gt One Dye Chip Data Signal to Noise Set up Window D Input the standard value of Signal to Noise S N item and click Apply button to confirm the number of spot applied before and after on the table right side of the window Click lt Back button to go back to previous step and click Finish button to activate Normalization and Filtering 34 E Normalization Result Anaya E C Wormatized Data 1 tng cxi F la oe Normalization Parameter O Omg ct Guanes Horrmmalirabiorn Fy 9 Signal do Her Fitlew ing Mol Aprile Di Ong tes EEL ESS DI omges i J PL 1 TT 15440 ar T a a zara TEN 0002 J Omg cs 3 z012 121 33 O dmg lo 4 ares nan 43 Y my er
98. n the file will be added on the list on the left side of the window and use Remove and Remove All button to delete the item Detection Present Call Threshold This only appears when Illumina chip data is selected and set the range of the Present Call If you select 0 05 all Probe ID which Detection P value is under 0 09 will be treated as Present Call and the rest Probe ID will be processed as Absent Call Click Finish button to input the selected data go to 2 2 gt Import Illumina Data qoa 22240 Y Finish HES Add Remove Remove All Fis ine XK Detection Present Call Threshold lt Figure 2 9 gt One Dye Chip Data Illumina Data Selecting Window 2 1 3 Import Two dye Chip Data On the menu bar select File Import Two Dye Chip Data or click on the third icon A then Analysis Information Input Window appears 21 Import Two Dye Data Analysis Name Untitled_3 Directory D GenPlex_Retail_TestimMy Analysis Preprocessing me Description ID Type Select ID type v Species Select Species w File Format GenePix Results format qpr v lt Figure 2 10 gt Two Dye Chip Data Analysis Data Input Window Analysis Name Input the name of the Analysis File to be created Directory Click button to select location where Analysis file will be created Description Input information related on Analysis File can be skipp
99. nalysis Processes 2 classes ALL AML Analysis Process 1 Basic method KNN all genes From Golub et al 1999 Analysis Process 2 Golub s method 50 genes Science 286 531 537 i Analysis Process 3 Improved method 6 genes The ultimate goal of the Classification is to have more accurate classified result with less number of genes To do this it is needed to select genes gene selection procedure to classify that show characteristics of cluster classify samples classifier selection procedure and then for the last procedure Error Estimation for confidence Generalization Error Estimation procedure which is most important 146 9 1 Gene Selection Gene Selection is a method to find out the genes which distinguish each cluster and also shows each cluster characteristically Generally thousands and millions of gene expression figures are given to the DNA Chip Among these genes it 1s the object of this procedure to find out tens and hundreds or even several marker genes Gene selection method can be divided in to two One is Uni Variate Approach and the other one Multi Variate Approach The first one is the method to select the genes with highest expression capacity after calculating individually the expression capability of each individual gene And the second one is the method to select several genes in one time considering correlation between genes According to the short time of calculation and e
100. ncel button to complete Filtering Error Spot procedure or delete Flagged Data Removal Exclude Flagged Spot in the Image Scanner Preprocessing Wizard Step 3 of 4 Flagged Data Removal Dataset Name oa The e Ol Ha Ad EH Ft VS 1201 2138 Fiag 329 42 HSS MB LIO RT 4 8K 031216 03 RT 4 8K 031216 05 RT 4 8K 031216 06 RT 4 8K 031216 07 RT 4 8K 031216 22 RT 4 8K 031216 24 RT 4 8K 031216 25 RT 4 8K 031216 26 RT 4 8K 031216 28 RT 4 8K 031216 29 RT 4 8K 031216 31 RT 4 8K 031216 33 lt Figure 2 20 gt Flagged Data Removal Input Window e Click Apply button to confirm the number of Spot before and after applying on the table right side of the window e Click lt Back and Next gt button to go to previous step or to the following step and click Finish and Cancel button to complete Filtering Error Spot Procedure or delete Miscellaneous Spot Removal Possible to exclude unnecessary Spot for further analysis from input data Preprocessing Wizard Step 4 of 4 Miscellaneous Spot Removal Dataset Name IDA HSA Liempty D ES SAH 28 27 spot DS YISH Apply z RT 4 8K 031216 03 HES 20 48 spot SS AQISILICH RT 4 8K 031216 050 RT 4 8K 031216 06 RT 4 8K 031216 07 RT 4 8K 031216 22 RT 4 8K 031216 24 RT 4 8K 031216 250 RT 4 8K 031216 26 RT 4 8K 031216 28 O case Sensitive RT 4 8K 031216 29 RT 4 8K 031216 31 Remove
101. nd Cancel button to complete Filtering Error Spot procedure or delete Q Intensity Range Set up the smallest value and the greatest value of Spot intensity and exclude spot out of this range Preprocessing Wizard Step 2 of 4 Intensity Range Spot intensity 0 Wet SAS 233101 AS SSS HOJLi spotS AAS gt USLICH maximum Y 2912 23 12 GenePix scanner 2 30 01 65 5352 230 9 SLICH minimum maximum lt Figure 2 19 gt Intensity Range Set up Window e In Input the smallest value and the greatest value of the Intensity then click Apply button to exclude spot out of this range and confirm the number of spot before and after applying on the table right side of the window Basic Value Greatest value of GenePix Scanner is 65 535 29 Dataset Name RT 4 8K 031216 03 RT 4 8K 031216 05 RT 4 8K 031216 06 RT 4 8K 031216 07 RT 4 8K 031216 22 RT 4 8K 031216 24 RT 4 8K 031216 25 RT 4 8K 031216 26 RT 4 8K 031216 28 RT 4 6K 031216 29 RT 4 8K 031216 31 RT 4 8K 031216 33 Dataset Name RT 4 6K 031216 03 RT 4 6K 031216 05 RT 4 6K 031216 061 RT 4 6K 031216 07 RT 4 6K 031216 22 RT 4 6K 031216 24 RT 4 6K 031216 25 RT 4 8K 031216 26 RT 4 6K 031216 28 RT 4 6K 031216 29 RT 4 6K 031216 31 RT 4 8K 031 216 337 e Click lt Back and Next gt button to go to previous step or following step and click Finish and Ca
102. nes X UT_1_Signal UT Y C_3h_1_Signal C_3h v 110 000 L 100 000 n 90 000 80 000 70 000 gnal C_3h 60 000 L 50 000 a na am gt C_3h_1_Si 40 000 ties pee 4 4 o i 30 000 20 000 10 000 o 10 000 20 000 30 000 40 000 50 000 60 000 70 000 80 000 90 000 100 000 UT_1_Signal UT DEGs All lt Figure 3 11 gt Correlation Scatter Plot Result Window 61 e Correlation Matrix Plot This visual shows the relation between selected Differentially Expressed Genes Correlation Matrix Plot Sample Correlation Correlation 1 00 0 80 0 60 0 40 0 20 0 00 0 20 0 40 0 60 0 80 1 00 lt Figure 3 12 gt Correlation Matrix Plot Result Window O Scatter Plot p vs SD This visual is taken with two axes Standard Deviation and Log P value on the Differentially Expressed Genes list in statistical method 0 600 me 0 625 18 0 650 Pa ka gt 0 675 i ip nah 0 700 ig iea 0 725 Kk 0 750 Log Pvalue 0 775 E 0 800 0 825 es 0 850 67 0 875 eee 0 900 2 500 5 000 7 500 10 000 12 500 15 000 17 500 20 000 22 500 25 000 27 500 Standard Deviation lt Figure 3 13 gt Scatter Plot Result Window O Sample PCA Confirm the recurrence between samples expressed in 3D visual whether selected genes show the difference between groups and can a
103. new Cluster Complete Linkage This is the method to adjust the lowest similarity value in to similarity value of entire Clusters among the similarity value between all individuals in two Clusters composing a new Cluster Single Linkage This is the method to adjust the highest similarity value in to similarity of entire Clusters among the similarity value between all individuals in two Clusters composing a new Cluster Ward s Method This is the method to operate clustering in a way of minimizing after calculating the sum of the squares among the group from the average value of each cluster to each individual after calculating the average value of each cluster on all variables 5 Input the name of the Clustering Result and click Clustering button to verify the Clustering Result Dendrogram 89 WE Hierarchical Clustering Result Dendrogram D 54 2 y Type Md 100 2 96 10 fa 2 tay Zoom Sart 12h MN Cash MN C72h OOOO 5 C24h MC h W UT lt Figure 4 11 gt Hierarchical Clustering Result Window Dendrogram Click Matrix button to save GEM in to text file Click Image button to save Dendrogram in to picture file Click Initialize button to adjust the cell size of the Dendrogram using basic set up size or fix whole screen Click X and Y button to control the width and length of the cell size E Dendrogram Pop Up Menu Right clic
104. ney Test One Way ANOVA Parameter Setting Result Name Student T Test_1 e Significance Level O Number of Genes 15 594 Reset a v O Class specific Number of Genes Class 1 Class 2 Statistical Significance Computation Asymptotic Distribution w Multiple Test Correction None Y Multi Class Case Cancel lt Figure 3 17 gt 2 Class Unpaired Test Parameter Set Up Window 66 3 3 3 1 Student T test In Student T Test just select DEG finding 2 Class Unpaired Test Student T Test on the menu bar Parameter is identical with Welch s T test Ref BO oll 3 3 3 2 Welch s T test In Welch s T Test 3 2 select DEG finding 2 Class Unpaired Test Welch s T test on the menu bar then set up window appears DD Parameter Setting O Significance Level Select genes below significant standard that user have set up O Number of Genes Select genes in higher score order that user have set up the number Class specific Number of Genes Select genes in higher score order of class Statistical Significance Computation Select the method of P value Asymptotic Distribution In case of assuming the data ratio distribution as regular distribution Permutation Test In case of not assuming the data ratio distribution eo Multiple Test Correlation Select the method to revise P value None Bonferroni Holm s procedure Be
105. njamini Hochberg FDR 2 Multiple Class Case If the Class is more than two it is possible to select two classes and apply Welchh s T Test Select two classes from the list on the left side of the window use Ctrl or Shift key then click pairs gt gt button to set up Y Click OK button to see the result window Ref 3 3 1 3 67 GenPlex v3 0 beta DEG Finding Module D WGenPlex_Retail_TestWMy AnalysisW E BR File Preprocessing DEG Finding Statistics Plot Window View Other Tools Help Bant E Fc Ho MEME A Analysis Untitled_1_1 mild E Gene Expressi Ga Class_A E E DEG Finding H E Fold Change One Dye_1 Ga Welch s T Test_2 a 0 Result Information DEG Finding Algorithm VYelch s T Test Parameter Significance Level 0 05 Statistical Significance Computation Asymptotic Distribution Multiple Test Correction None DEG Filtering None baso Se Line No 10 11 11 6566 12 20260 13 739 Probe ID 1455036_s at 1416510 at 1454699 at 1429278 at 1436294 at 1420525 a at 1444045 at 1419477 at 1426688_at 1455285 at 1448390 a at 1452133 at 1416344 at _ Case sensitive Exact Match T Score 28 557 2 Line No 1 110904 2 114491 3 1478 4 14882 5 20646 6 120515 E 1759 3 119807 9 119084 10 117286 11 15407 12 1683 13 20171 Probe ID 1426645 at 1435626 a at 1416083_at 1420623 x at 1452519 a at 1452388 at 1416364 at 1451680 at 1450957 a at 1449110 at
106. nsidering each gene in one cluster E Second Step Cluster in one after finding most similar two clusters in significant pattern among 1 000 clusters This procedure leaves us 999 clusters NW Recalculate the similarity value and cluster in one after finding most similar two clusters in significant pattern among 999 clusters This procedure leaves us 998 clusters Repeat this procedure to g99 step finally one cluster will be left And the result of this clustering will be shown in Dendrogram of a tree format figure below 140 One thing we have to notice from the above Algorithm That is how much is it near between two clusters In other words how define the similarity and the dissimilarity Following this definition linkage type and distance measure of two clusters will be fixed Among the linkage method Single Linkage method is a procedure to renovate with entire cluster similarity selecting high similarity value with the cluster of counterpart among former clusters composing new cluster Complete Linkage method is a procedure to renovate with entire cluster similarity selecting low similarity value with the cluster of counterpart among former clusters composing new cluster Average Linkage method is a procedure to renovate with entire cluster similarity calculating the average similarity with two former clusters each and counterpart cluster composing new cluster Single Linkage Complete Linkage A
107. obe ID Type Select ID type Species Select Species lt Figure 5 1 gt Analysis Creating Window DD Analysis Name Input the name of the Analysis File 2 Directory Click button to select the position where Analysis File will be created 3 Description Input the additional information related to Analysis can be skipped 4 Probe ID Type Select the ID Type of the input data e Commercial Product Probe ID Affymetrix GeneChip Probe ID Agilent Probe ID One dye Agilent Probe ID Two dye Applied Biosystems 1700 Probe ID 107 CodeLink Probe ID Tllumina Probe ID Operon Probe ID e Public DataBase ID IMAGE Clone ID NCBI Clone ID NCBI GenBank Accession NCBI GeneID LocusLink NCBI UniGene ID e Others In case ID unknown 5 Species Select the species of the input data Click OK button to have Analysis File created then the data selecting window will appear Go to 5 1 7 9 1 2 Open Analysis On the menu bar select File Open Analysis or click on the second icon to open saved Analysis File 9 1 3 Recent Analysis On the menu bar select File Recent Analysis to open recently analyzed Analysis File This list can be deleted using Clear History menu 9 1 4 Save Analysis On the menu bar select File Save Analysis or click on the third icon to save the working Analysis File 9 1 5 Save Analysis As On the menu bar selec
108. oe on i Faray nannies lt Figure 3 21 gt Box Plot Result Window 2 3 4 4 Correlation Scatter Plot On the menu bar select Statistics Plot Correlation plot icon HE to see the Correlation Plot for each Class lr iyi E Cee Expression barie SA DES ining 3 08 Fi e Ore Dra 1 CD 10 0 Sipa ry ED 10 11_Sigrel nasa or click on the ninth lt Figure 3 22 gt Correlation Scatter Plot Result Window 3 4 5 Correlation Matrix Plot On the menu bar select Statistics Plot Correlation Matrix Plot or click on the tenth icon il to see Correlation Matrix Plot for each Class a E beet et Piua Ez itl apn thers Miri p a O vaha T T st_1 Corelin 100 0 00 0 50 0 40 Ot 0 00 Ball TLE 1 00 lt Figure 3 23 gt Correlation Matrix Plot Result Window 13 3 4 6 Venn Diagram On the menu bar select Statistics Plot Venn Diagram or just click on the eleventh icon kil we can verify the Venn Diagram and gene list of each combination with 2 3 DEG Finding Result GonPles 92 0 bela DEG Finding Module 2 WGenPles_Aetail TrA Hy AnalysicWOEG Fimding W Umided_ _t Al fe Ble Preprocessing PES Finding Ganon indie ew Dire Tools Help Debug Dont E k i dema A E E Vers Diagram o O fod Change cree deer Obras T Tes a Sha Pelee F LAL EST Sot 41657 1410904 a t d 10543 ok pe hae ie 4 HETTA ot LALAU t lt Fi
109. off are selected If this option is selected Separate Up Down Results function as seen above does not activate 4 Two Class Comparison It is applied when comparing Two Classes First select reference class and target class of the data and then select the method to be applied Ref 3 3 1 1 o Common DEGs across all combinations The genes of all Fold Change combinations over cut off are selected 5 Click OK button then the result window will appear 3 3 1 3 Fold Change GenPlex v3 0 beta DEG Finding Module D GenPlex_Retail_TestMy AnalysisWDEG Fin E 1 3 Fie Preprocessing DEG Finding Statistica Plot Window View Other Toole Help Rabe E Fc 4 eRe amp A Analysis _ Fold Change One Dye_1 E eo Untitled_1_1 mld Gene Expression Matrix Result Information H Class_A DEG Finding Algorithm Fold Change One Dye 6 63 Class B Parameter By DEG Finding Condition The data was log transtormed wih base 2 H E Fold Change One Dye_1 Fold Change Cutoff 2 0 Averaged over all combinations Reference Class Class_4 Target Class Class_B DES Fitering Mane Q Case sensitive Exact Match Line Mo Probe ID Average log Average Fol AFFC Trpmk 3 1415673 at PPP per 1415720 s at PPP PPP 141573 PPP per 1415779 PPP PPP 1415766 ra Pur g 1415805 Pad a ME 1415606 PPA ara 1415823 PRA MAM M 1415832 AAA aap 1415834 PPP PPP 1415856 PPP sa 1415861 PAA La gM a a g
110. old Change One Dye 1 Qe Fold Change Cutoff 3 O log Fold Change absolute value _ Separate Up Down Results 3 Reterence Class Choose a class vw Target Class Choose a class Y Qe Averaged over all combinations _ Show Fold Change Values for All Genes O Satisfying the threshold over of all combinations lt Figure 3 6 gt Fold Change One Dye Set Up Window D Result Name Input the name of the result to be created 2 Select the cut off to be applied Fold Change Cutoff Value of Fold Change which is not transformed to Log value Basic value 2 logo Fold Change Value of Fold Change which is transformed to Logs value Basic value 1 Separate Up Down Results Each up regulation gene and down regulation gene list will be created based on the Fold Change result Select reference class and target class of the applying data 4 Select the method to be applied o Average over all combinations Genes with more than average cut off of all Cold Change combination of reference class and target class will be selected Show Fold Change Values for All Genes Shows the Fold Change Value of o all genes If Separate Up Down Results option is selected as seen above this function will not activate Oo Satisfying the threshold over LJ of all combinations Genes with more than ratio that user have set up from all Fold Change Combination
111. olumns ISTECH Species Rattus norvegicus Total Ro of Probe ids 1832 probe ids Type of cihp Rat230_2 Total No of Searched Probe ids 1832 probe ids 1367555_at 24186 Alb Rat Expressi Rattus norve Rn 34353 1 Rn 202968 2004 B albumin chr14p22 ENSRNOGOO PO2770 1367564 at 24602 Nppa Rat Expressi Rattus norve Rn 2004 1 Rn 2004 2004 B natriuretic pe chr5q36 ENSRNOGOO P01161 1367566_at 25575 Scgb1al Rat Expressi Rattus norve Rn 2206 1 Rn 2206 2004 B secretoglobi chr1q43 ENSRNOGOO P17559 1367594 at 25181 Bgn Rat Expressi Rattus norve Rn 783 1 Rn 783 2004 B biglycan chrxq37 2 P47853 1367600_at 64362 Des Rat Expressi Rattus norve Rn 1657 1 Rn 39196 2004 B desmin chr9q33 ENSRNOGOO P48675 NG 1367691_at 85332 Prkcdbp Rat Expressi Rattus norve Rn 12281 1 Rn 12281 2004 B protein kinas chr1q33 ENSRNOGOO P97585 MG 1367749 at 81682 Lum Rat Expressi Rattus norve Rn 3087 1 Rn 3087 2004 B lumican chr7q13 ENSRNOGOO P51886 1367762_at 24797 Sst Rat Expressi Rattus norve Rn 34416 1 Rn 34416 2004 B somatostatin chr11q23 ENSRNOGOO P0042 1367810_at 50690 Slcbad Rat Expressi Rattus norve Rn 10336 1 Rn 10336 2004 B solute carrie chrxq37 ENSRNOGOO P28570 1367812 at 29211 Spnb3 Rat Expressi Rattus norve Rn 20389 1 Rn 20389 2004 B spectrin beta 3 chr1q42 ENSRNOGOO Q9QVYNS 1367825_at 29622 Ralgds Rat Expr
112. ow then click Search button to see the result table with reversed related gene Then select Case Sensitive option to search separately the capital letters and small letters Image width Change the width of Heat Map seen on the result window Input the width to be adjusted and click Enter key Select Heat Map Change the color of Heat Map Red Green Blue Yellow List of DEG Finding result Line No It shows the order of genes of input data ID It shows the ID of each gene Click Hyper Link then it will be linked to related database URL and it is possible to see the detailed information of corresponding genes To have exact information we must select exact ID Type when Analysis is creating O FBO Pree CS ms v e rez cS NCBI Nucleotide Sharh Nucleotide Far Bag 345 AIS z BE id Semih Limits Previemindex History Clipboard Detads Display Surnmary Show 2 Send ig az bactena 0 mena 2 Remeg 0 ES Show only records from CoreNucieotids 04 EST cz 655 001 here 1 2 of 2 Ce Page Ol 417325968 Reports 70005 25 HCOT_LOGAP_ Eni Homo sapens COMA clone IMAGE 1472336 3 amiar to TEOS S60 945460 CLASS I HISTOCOMPATIBILITY AMNTIGEN LIEE PROTERT mENA sequence SISOS ab ALT 32563 115054081 Oz 44873499 Apra AU NCL CGAP Mids Homo tapaent COMA clone IMAGE 1472336 3 gear to TE Q9460 093460 CLASS 1 HISTOCOMPATIBILITY ANTIGEM LIKE PROTEIN n MEMA sequence gil256962 b AAST3499 1
113. plied Biosystems 1700 Probe ID CodeLink Probe ID llumina Probe ID Operon Probe ID eo Public DataBase ID IMAGE Clone ID NCBI Clone ID NCBI GenBank Accession NCBI GeneID LocusLink NCBI UniGene ID e Others If ID is unknown Species Select the species of the data to be input Click OK button to create Analysis then the Data Selecting Window will appear go to 3 1 7 3 1 2 Open Analysis On the menu bar select File Open Analysis or click on the second icon Y to open saved Analysis 3 1 3 Recent Analysis On the menu bar select File Recent Analysis to open updated analyzed Analysis This list can be deleted using Clear History menu 3 1 4 Save Analysis On the menu bar select File Save Analysis or click on the third icon to save Analysis under operation 3 1 5 Save Analysis As On the menu bar select File Save Analysis As or click on the fourth icon IF to save Analysis under operation in different name ol 3 1 6 Close Analysis On the menu bar select File Close Analysis to close Analysis under operation 3 1 7 Import Data On the menu bar select File Import Data or click on the fifth icon rh then Data Selecting Window will appear m GEM Matrix import Gene Expression Matrix GEM Matrix lumina BeadStudio Resul 1 Import Gene Expression Matrix Remove All FL eile Ls x Intensity Start Column
114. r 1 Cluster 2 Cluster 3 Cluster 4 No Genes 23 No Genes 6 No Genes 14 No Genes 1 sa P n a A Y j a 2 ae PA i A 2 AV al A i LAI y M j ly TAN 1 V V Y NA 11 5 na ao y A a X YY 0 Vi AAN W FA 7 j pe Al i 25 y j 1 E y 0 5 w 6 a 2 n Ea 2 a o 5 mm 6 D 2 y Ea a amp o 5 w 6 a 2 y Ea 2 6 0 S n 5 Pa z5 n ka a amp Cluster 5 Cluster 6 Cluster 7 Cluster 8 No Gones 18 No Genes 8 No Genes 10 IY 1 o 5 w 5 a a a Es a a o 5 w 6 a gt EN amp o 5 w 5 a x y E 2 S o 5 w 6 m a y ES a a Cluster 9 Cluster 10 Cluster 11 Cluster 12 No Genes 15 No Genes 14 No Genes 18 No Genes 6 A 4 E a i 154 A 4 A r j j eal f h A A 1047 aN N A Lif a YN j J Ee NY N nl Vw as y f j a y x J AV si l A A af an af 2 y Y aM i ni 5 Aj as 3 T a f E Y anti J v A n h j IN y 2 y 5 y al V 0 5 0 56 3 353 253 0 4 os 056 203 HD Ba 4 0 5 0568 DBD BO 4 5 a Doa saas Cluster 13 Cluster 14 Cluster 15 Cluster 16 No Genes 14 No Genes 3 No Genes 12 No Genes 21 lt Figure 4 21 gt Self Organizing Map Profiling Matrix DD Save Image It is possible to save Profiling Matrix in picture file 2 Complete Display It shows entire gene profiling in the graph of each Cluster Initial graph set up will only show Maximum Median Minimum NW Entire Cluster Result Window Cluster 1 Cluste
115. r 2 E IO EE Properties Murat ct Goren uo j Girne of Carr ti j z HT E ot SAVE AS cae Heatmap Annotation pS Color Scale around Zero _ o o o 1 i i A t 1 T L 1 i F mia b Cluster 3 Cluster 4 Color fellow Bluetupidoywn Foam Out gt lei 0 AAA Color Brightness Scale kilo Range Copy Image to Clipboard 5 Save Image Cluster 5 Cluster 6 Purses of Geena E Pardos of Dama til j Proba E _1 EI A st ot OA 2 E D 1 B ai ia Toz 10 378 Hara CEST ams E APRS A os 1 Poth aay Ee 013 ans Fae 2068 T a 16 1d bs 14 55 1 13770 CETE mir 1 POD 2 il a 13 eh 10 Cea i 14 65 11530 EE 13103 _4 LA a 1 Le 12713 E 137M tt oA 11 16 348 7 paa a 1 1 12764 eter 13373 HDA 1125 e binni a A i2 ipii 1s 12308 igen 11 133 11 16130 BM AO al 16 tido 10 545 HT 12 605 Hdi marma 14 408 LAT GL 1a 14 074 nen 197 EE A 1150 137510 m E 1 E 19308 HE 13 079 VETS 137T2 14 341 12 A 15 es 18154 bd oa 1132H Hi EA vo 14 305 lt Figure 4 22 gt Entire Cluster Result Window 97 QQ Profiling Graph Range It shows the profile only Maximum Median Minimum of each Cluster 2 Heatmap Range It shows the Heat Map of used data of Clustering Heatmap Pop Up Menu Heatmap Annotation On the right side of the Heatmap user can see the Annotation Information in one screen But Illumina Probe ID excluded Color Scale around row Average It is possible to control color of Up Down based on the ave
116. rage of each gene Color Yellow Blue up down It is possible to change the color of Heatmap in to Yellow and Blue Color Brightness Scale It is possible to control the brightness of Heatmap Copy image to Clipboard It is possible to copy Heatmap image in to clipboard Save Image It is possible to save Heatmat in to picture file 3 Data Range It shows the Cluster Order and Signal Intensity value among each genes in form of table Click hyperlink of each ID to be connected to the URL database of the corresponding gene to verify detailed information of the corresponding gene 98 m Result Window of each Cluster Hierarchical Clustering MAQC_4_platform_One Way ANOVA_61 Hierarchical_Default_2 oo gf Ed D Save Simple Profile PathwayAn Annotation O Cluster 1 Copyimage Pixet 10 S 69 21a Number of Genes 350 8 CRES css __ ese EEES A mmm ms a et T it j A 55 YA eee RP SO OOOO DIS FA 4 6 0 pa A HAN 5 NM 5 0 45 40 35 3 0 4 25 2 0 4 15 1 0 0 5 0 0 0 5 1 0 4 15 2 0 25 4 3 0 3 5 4 0 4 5 o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 4 ID Search Case Sensitive Exact Match Probe ID AF_3h AF_6h AF_12h AF_24h AF_48h AF_72h AB_3h AB 6h AB_12h AB_24h 1 101700 0 191 0 388 0 648 0 296 0 272 0 012 0
117. ray Suite 5 Normalization Methods PM Intensity Adjustment 0 Global Median Y Quartile gt Sketch GQuantile Li Ph ony 08 PRM CHP Type Save CHP files in GCOS format lt Figure 2 4 gt Step 3 3 IVT Expression Chip Preprocessing Method Selecting Window 17 e Quantification Methods RMA Robust Multichip Analysis Plier Probe Logarithmic Intensity Estimate MAS5 Microarray Suite 5 e Normalization Methods Global Median Quantile Sketch Quantile e PM Intensity Adjustment PM only PM MM eo CHP Type Select the item of Save CHP files in GCOS format new file chp will be created in the folder where cel file is saved e Click Finish button to operate preprocessing go to 2 2 D W 3 IVT Expression Chip Preprocessing Result GenPlex y3 Preprocessing Module D WGenPlex_Retail_TestWMy AnalysisWPreprocessing 2 WGenPlex 3 0_Test mlb ER File Preprocessing RRA 500 Bah lares Bes E zj F After Hormalization 1 Cancer Statistics Plot Analysis Data Window View Other Tools Help A Analysis ay GenPlex 3 0_Test mlb E lt Y Before Normalization Cancer Q c 1 po 0 2 D 3 HC 0 4 D es C 01 Ly c2 Y Normal C Na bo Y N 2 C N 3 Y N 4 Y N 5 Y N1 Y N2 Y After Normalization O Cancer po CA Eie HD C 3 O c 4 HC 0 5 jc bo Y C2 Normal Y N 1 C N 2 bo Y N 3 Y N 4 bo Y N 5 Y M 2 Y N2 Gene Level Analysis After
118. reprocessingv04 2 9G LI M1 txt sample preprocessingW04 2 9G LI M2 1 txt sample preprocessing 04 2 9G LI M2 txt a 206 ed lt Figure 2 11 gt Two Dye Chip Data Data Selecting Window Import ImaGene Data HAS SDH Sa WBSVUC AA Add WES 0125101 SA AYO GME 01022 AAS E Finish HES FES 012 203 ANG F GOS SSSvict ran Renee a Target Cy5 Reference Cy3 Asampleipreprocessinglahacy5 txt Asampleipreprocessinglahacy3 bt Asamplelpreprocessingtestslidecy5 txt Asamplelpreprocessingitestslidecy3 txt a a a a a a 2 lx Representative Value of Foreground amp Background Intensity 2 Spot2 Image pixel 159 MERE SZtetrmMedian L E Mean0 A 4 eah Op LICH 8 Median O Mean rn me lt Figure 2 12 gt Two Dye Chip Data Data Selecting Window ImaGene Data Click Add button to select file to input then the selected file will be added on the left side list To delete the file use Remove and Remove All button In case of ImaGene Data designate Cy5 and Cy3 in Pair to input O Click Finish button to input data selected go to 2 2 gt e If there is an error in data format 29 You can retry after adjusting the error with document editor 2 1 4 Open Analysis On the menu bar select File gt Open Analysis or click on the fourth icon then you can open saved Analysis File 2 1 5 Recent Analysis On the menu bar select File Recent Analysis then you can open the
119. rithm of Gene Selection Classification Error Estimation and click Run button to verify once for all the Error Estimation following number of genes In the Whole Computation Result Window drag the mouse point through the graph then it is possible to verify number of each gene and Error Estimation and also can save the result graph in to picture file 124 no Hw y a s amp OF TF TT E oF H Gene Number lt Figure 5 17 gt Whole Computation Result Window 128 5 6 View 9 6 1 Show Sample 3D View On the menu bar select View Show Sample 3D View to verify visually in 3 Dimension whether the Training Data is distinctive or not of 3 genes that the user have designated 5 6 2 Show Summary View On the menu bar select View Show Summary View to verify the summarized information of Error Estimation and to save it in to text file BA Summary Total Class Number Total Sample Number LOOCY_1 Distance Classification Algorithm K Yalue Weight setting Gene Selection Algorithm Gene Selection Number Permutation Test Average Accuracy K Fold_2 Fold Number iteration Number Distance Classification Algorithm K Yalue Weight setting 4 63 incomplete Euclidean Distance Ordinary Weighted K Nearest Neighbor Seeks Weighted Regularized t test 50 No 100 incomplete 10 100 Euclidean Distance Ordinary Weighted K Nearest Neighbor
120. rovides related gene Signal and Annotation Information 6 2 2 Pathway XML D Simple editing is possible in Pathway XML tab Pathway TEST_Yolcano Plot_test con hsa04530 Tight junction j Pathway Display Pathway Eat TIGHT JUNC TION Paracellular 3pace t Initiation of _PAR6 a Caca2 cell cell adhesion Adherens jun p PARS P p Formation of PP2A lateral membrane mLgl G1 5 phase prog Regulation of ge expression Regulation actin cytoske KEGG View Gene Counts 3 Probe ID SNUD1_Signal SNUD3_Signal SNUD13_Sig SNUD2_Signal SNUD4_Signal SNUD14_Sig 201718 s at 8 872 8 071 6 897 7 592 213306_at 9 660 9 329 7 691 8 555 235165 at 9 287 8 901 38 274 3 168 Expression Annotation Basic Information Protein Information Gene Ontology ID Conversion lt Figure 6 4 gt Pathway Map Figure 134 Algorithms 7 DEF Finding Algorithm This is the method called DEG Differentially Expressed Gene Finding that is to find out genes expressed differently in statistics between analysis group e g compare between control group and treatment group 7 1 Fold Change This method was mainly used in early days of DNA chip analysis because of its strong points which is simplicity in applying and easy interpretation of result These are generally used until recent days Calculate the significant figure between control sample reference sample
121. rsity 1 Pearson Correlation uncentered Measures the similarity grade of two individuals using actual signal value calculated correlation coefficient of two individuals Absolute Pearson Use the absolute value of Pearson correlation coefficient eo Initializing Method Linear Random e Topology Hexagonal Define the Neighborhood radius in hexagon form Rectangular Define the Neighborhood radius in rectangular form Input the name of the Clustering Result and click Clustering button to verify Clustering Result M Self Organizing Map Result It 1s easy to classify the similarity between each Cluster with color and provide various options if the user click the button on the top of the Result Window lt Figure 4 18 gt Self Organizing Map Result Window U Matrix 95 Oo Distance View As seen below figure it is easy to classify the similarity lt Figure 4 19 gt U Matrix Distance View e Show Cluster Information Can verify the information Cluster order Number of genes including in Cluster of each Cluster O Show Similarity Can verify the similarity between Clusters 247 genes rana Oia lt Figure 4 20 gt U Matrix Show cluster Information Show Similarity O Save Image Possible to save U matrix in picture file 96 E Profiling Matrix ocr El E sett organizing Maps Profiling Matrix SOM_Default_2 Savelmage Complete Display Cluste
122. s 101 4 4 1 aaa iS O enn neha 101 4 4 2 valo Tred CiO ia 102 4 9 ETERNAS a din 105 HP EEE S T E T E TEAN 106 De AE MA T ees 107 ok AP O 107 Belada A A AE E A S O AS IA 107 Dalig ie Mi MAS ly SI aaa O ete a A 108 Dulce CANA 7S 5 nan oars A ees ac Teo nates 108 laa SI O IS E E E 108 os S BAE A A IN Gass Meisel 108 DL 0 A SB otha te adress reteset a wads hee es ncand aol e shauna 109 are bere Load Tra Data Fie enaa a naa 109 Sue Bod est Data e haen O 110 5 1 9 Lra DO A ia ns les ote 110 A O SN A O a see avedinns 110 A sal i heed tye tae Aaa escuta nat tata cniataairan ola tncweaaateataniatye ee a acnneceumuaanssnee ates 110 Ov POPE OC a E EEE A s 111 5 2 1 NS CK Mae Dal a e EE E E ARA 111 SAA Ritter miss me Dd acne evens 111 Dido o AA A ae Oe eS tS 112 Oi AP essen Saha no te nceasei a E LES Des La A A A A ANT LES IS o A a a sana meaaietnasiaees 113 OOo A A EAS 114 004 Sloan levi growl Result 116 ON DELAS ACUE GENS elEco iia 117 9 0 0 FRO Os IS CSO ticas DA le Export to Pathway Analysis Module o Ibid ELF Goo Save Results e A 118 9 4 ASS A cie 119 5 4 1 Select IIS Pam cul ne eae ee ee 119 DAL E eda ganna de a eaten ane eae nme 119 ee A e 120 DAA Mal DA nt eR Ce SOT eS 120 DO Eror ESUN A N E N 122 Deon li A O leaned Walon alien aia EEEN EET 122 N E E UC arrestee sae Soci O dico elos 122 DaO ESE a EEE AEE EEEE E EEE EEA IEA EESE AET EE E A A EE E 122 5 5 4 Wiole COMPOR O aae A II A 124 DO NS Wo t
123. sider this as a high dimension data transformed in to low dimension generally 2 Dimension to see it visually This characteristic has given help in analyzing high dimension DNA Chip data Also SOM can be understood as the generalized format of K means the user can control the parameter value and can have desired result format But this point can rather be annoying to biologists The good point of SOM which is the visualization is to show the similar cluster pattern in neighborhood 145 9 Classification Algorithm There are some people who think that if only DNA Chip experiment is successful the result can be easily translated with basic procedure without any effort Let s say they think if the material is the best the food will be tasty But even if you have best material it has to go through best cook s hands and then the food will taste delicious with best flavor of the material In case of DNA Chip is the same It needs to go through detailed analyzer s hands For a good example there is a method of sample Classification analysis which is called generally Classification The figure below shows us clearly how different the result can be following the data analysis For 34 test samples 6 samples falsely classified Analysis Process 1 Accuracy 82 4 5 samples falsely classified Analysis Process 2 Accuracy 85 3 ALL AML Asa procesa plis ieee classified Acute Leukemia Data Different A
124. st Data using Gene Selection result Classification 1s possible if only there are more than two Training Data and more than one Test Data 9 4 1 Select Distance In Classification distance calculation between vectors is used and the user can select the method of distance calculation at this point On the menu bar just select Classification Select Distance Basic value Euclidean Distance Ordinary Classification Distance Ref 8 1 Euclidean Distance Ordinary The method using the geometrical distance between two vectors SD weight Use the calculated distance with weight with standard deviation between two vectors Manhattan Distance Calculate considering the ratio of each variation occupying Minkowski Distance E oo Pearson Correlation Coefficient The method using the Correlation Coefficient of two vectors 9 4 2 Select Algorithm On the menu bar select Classification Select Algorithm or just click on the eleventh icon BH Basic value Weighted K Nearest Neighbor W Classification Algorithm Weighted K Nearest Neighbor Decide the class of the given individual considering the class that this K unit of individual belongs after calculating nearest K unit of individual with given individual Prototype Matching with indeterminacy parameters Multi FLDA The method to assign to class forming the linear distinction 119 5 4 3 Set Parameter s It is possible to s
125. t File Save Analysis As or click on the forth icon eh to save working Analysis File in a different file name 108 5 1 6 Close Analysis On the menu bar select File Close Analysis to close working Analysis File 5 1 7 Load Training Data File s On the menu bar select File Load Training Data File s or click on the fifth icon cay then the data selecting window will appear Load Class Data Import Gene Expression Matrix F isampleiclassificationiPreprocessedTrainData_3571 genes_ALL txt Fisamplerclassification PreprocessedTrainData_3571genes_4ML txt Remove Remove All EEE te x GEM Format Basic GEM ID Gene descriptions Intensities w Intensity Start Column lt Figure 5 2 gt Class Data Selecting Window D Click Add button to add inputting file it will be added on the list on left side of the window and use Remove and Remove All button to delete the item 2 Condition Column Start Position Fix the position where intensity column in input data Click Check button to input data 109 NW Input Data Result TEST mdd Gene Di ON ES 62 00 0941 SCE YE W I WCE Gene Expresion Mairi 1 AAA METE tay CELI E E 63m EL i ate a A Ge 2 ame 0156 0148 dC Treningi Clas T 3 asmen 1 655 0 757 ary 1 575 1 585 2165 1 932 1443 DEG Finding 4 BATT 1 400 ar O 10 12 1 510 S ARA DET Oa Oi 110 145 on 105 MEE E A
126. t up window appears Clustering Hierarchical KMeans Self Organizing Maps Select Gene Expression Matrix TEST y GEM Objects Geometry e Gene Experiment 4 x 4 Neighborhood Function Bubble Alpha 0 05 Radius 3 0 Distance Measures Euclidean Initialization Method Linear Max Iterations 1000 Topology Hexagonal Cluster Name Default Clustering Cancel lt Figure 4 17 gt Self Organizing Map Set Up Window d Select Gene Expression Matrix Select GEM which will be applying Clustering 2 Objects Select the standard applied with Clustering O Gene Standardize the gene e Experiment Standardize the Sample 3 Geometry Fix the number of Cluster in second dimension Geometry form Basic value 4x4 4 Possible to fix Initial Alpha Value Basic value 0 05 Radius Value Basic value 3 0 Max Iteration Value Basic value 1 000 5 Select Mathematical function composing SOM e Neighborhood Function Bubble Gaussian e Distance Measure Ref 8 1 Euclidean Distance Geometrical distance between two individuals Manhattan Distance Distance between two individuals considering the 94 gravity occupying in each variable Pearson Correlation centered Measures the similarity grade of two individuals using the correlation coefficient after transforming each individual s average O and dive
127. the fourth icon A to save working Analysis File in different name 4 1 6 Close Analysis On the menu bar select File Close Analysis to close working Analysis File 80 4 1 7 Import Data On the menu bar select File Import Data then the data selecting window appeafs import Gene Expression Matrix Import Gene Expresssion Matrix GEM Add Remove Remove All GEM Format Basic GEM txt xls ID Gene description Intensities Basic GEM Intensity Start Column 2 Finish Cancel lt Figure 4 2 gt Data Selecting Window D Click Add button to select input file then it will be added in the list on the left side of the window Use Remove and Remove All button to delete item 2 GEM Format e Basic GEM ID Gene description Intensities Intensity Start column In case of Basic GEM it is possible to fix the location where column is started It is also possible to use description information in case there is description column between ID column and intensity e Basic GCOS output Signal Detection Detection p value e Basic GCOS output Signal Detection Y Click Finish button to input data 91 E Input Data Result GQ To see the input data double click the name of the input data in the search idiots i eS eae LELLECER o gt Ter lt Figure 4 3 gt Input Data Result Window window Missing
128. the left side of the window Use Remove and Remove All button to delete item Detection Present Call Threshold Oo o There is no Detection column as Signal Detection P value as file from BeadStudio Create Detection column setting up the Threshold value of the Detection P value then it is possible to delete with Detection Call from the DEG Filtering method when selecting DEG E Data Input Result is i mid 5 Gene Expr s sion Matrix Probe La Chace l Sania Count E dE UU 0173 41 UA UA AA UH A Wht Aa wt D GRT OH OG GAT A a A od Oa DTE PETS 0 Um WA 0713 2716 Bda DAN LILA Wd OR 413 01 Haid OIA Ara 2110 pia paro 214 GRT 4 00003 GAT 4 6 09 GRT 4 0 09 DTO 0 140 DU GELE 0 106 Urb DAL PAR uE usb PRL 012 TE 0 08 0 12 07 0 087 007 ea OM 00733 0206 00 tc 020 0113 oe vaki oon uins oom var AGEE 20013 AL AF ALA Oo Oa Ages La SUL Ls AI dl a om 0 0 ADEE 07114 UNE Qu Os Ou Ot ASOT RS ALZA LA Om i 02 Us AGEE Ais GERi EA I aU a 00 UO A way Oe oa I5 0 105 tea mpl DU 1103 114 aU AE 0 21 Uza puni 004 are 40 111 006 0 155 0 3 A 2 ara 252 a0 20 a 00 006 mn URES 0 14 0152 0 169 000 0 134 A fi 00 Ours air a0 059 02 04S z O35 i i 1 4005 INN OU pie WISS O55 1H 004 a0 67 de a Quits Oia O1b2 DISS 0 159 0515 AS a dd n a 0 0445 047 DA ANG 4115 m DIA 0 156 AS m ek pae AAT DIET 0 08 4 007 0 136 z E A AA A AA A a ua
129. tic variation Nucleic Acids Res 30 e15 2 4 B M Bolstad et al 2003 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias Bioinformatics 19 2 185 193 48 DEG Finding 3 DEG Finding This is the procedure to finding DEG Differentially Expressed Gene a procedure to find differentially expressed gene statistically between analysis groups eg compare from the reference group to target group 3 1 File 3 1 1 New Analysis In Preprocessing module if the data is exported the Analysis File will be created automatically so this procedure does not correspond go to 3 2 On the menu bar select File gt New Analysis or click on the first icon Le then Analysis Creating Window will appear New Analysis Analysis Name Untitled_1 Directory F isample DEGFinding Description Probe ID Type NCBI GenBank Accession Species Human lt Figure 3 1 gt Analysis Creating Window Analysis Name Input the name of the Analysis File to be created Directory Click button to select where Analysis File will be created Y E Description Input additional information related to the Analysis File it can be skipped 4 Probe ID Type Select ID Type to be input eo Commercial Product Probe ID Affymetrix GeneChip Probe ID Agilent Probe ID One Dye Agilent Probe ID Two Dye 90 Ap
130. tip Lowess Quantile Normalization etc Convenient Gene Expression Matrix GEM format generation 1 2 3 DEG Differentially Expressed Gene Finding It is possible to apply various conditions of Fold Change and it also offers us the various statistical analysis methods to compare 2 class or multi class It 1s possible to compare the out coming DEG using Venn Diagram and we can verify visually the difference between Fold Change and the Statistic analysis result using Volcano Plot Fold Change One dye Two dye Parametric Test for 2 class comparison Student T test Welch s T test Z test Nonparametric Test for 2 class comparison Mann Whitney test Paired Test Paired T test Wilcoxon signed rank test Parametric Multi class Comparison One way ANOVA Nonparametric Multi class Comparison Kruskal Wallis H test Multiple Test Correction Bonferroni correction Holm s procedure Benjamini Hochberg FDR Volcano Plot Fold Change vs Statistical Test Venn Diagram combining results from various methods 10 NW Statistics Box Plot Correlation Scatter Plot Correlation Matrix Plot 1 2 4 Clustering It provides various clustering methods and visualization and it is also possible to verify statistically the clustering result In case of K means it helps the users to conclude judge in predicting the most suitable number of cluster E Hierarchical Clustering with useful Linkage methods K means Clustering SOM Self Organizing Map
131. ton to see the result AND OR Complement operation is possible to figure out the Intersection Union and Complement of each list For Common Gene Count it will show gene list common in more than the user have set up from the result selected from the list 3 3 6 Import Gene List In case of the text file in form of gene ID input in each row it 1s possible to input this for the result of DEG Finding On the menu bar select DEG Finding gt Import Gene List then the Data Selecting Window appears 3 3 7 Export to Clustering Module On the menu bar select DEG Finding Export to Clustering Module it is possible to export various DEG Finding Results in to Clustering module 3 3 8 Export to Pathway Analysis Module On the menu bar select DEG Finding Export to Pathway Analysis Module it is possible to export various DEG Finding Results in to Pathway Analysis module 69 3 3 9 Save Result s As Text On the menu bar select DEG finding Save Result s As Text it is possible to select DEG Finding Result and save it in to text file 70 3 4 Statistics Plot It is possible to verify the input data statistically through various plots 3 4 1 Basic Statistics You can verify basic statistics of the input data On the menu bar select Statistics Plot Basic Statistics then user can see the result DD The result of each Class is shown separately in tab es ann Pi
132. u bar select Statistics Plot QQ Plot or just click on the thirteenth icon A GenPles A A ume fo e x De Pr procertino Gtatsticamot Anaia bsa yandaw Wew Gear Tooke piai SERA 500 Dph blues Cda A A so ra EEE Dr uan opa 0763 Ci omg cwi O amg ca a LE 2 a O Orgies mi 1 img cs DY rg ee l CI Gnguter i Y omg 008 174 D amgen O Simp az al BEC EXA 11 C 209 0x4 H Ly mg CxS 13 Di Meg a D 20mg cu Ei Di iimg it 8 my Si humaine Data 10 C img ci E a CY Omg cx C Omg tia 1 Omg_tea T Di Geng ews a OY img tii Ci omg Ex ai C oma ca ay O Sing ow a Y Weng ow z C Sieg cx 1 Zig Exa 1 Ming oes a O Heg ti a4 Y my caT D amg CME c j Can confirm the QQ Plot in each tab of the selected data img Em3 a Filtered Data Neemalized Data lt Figure 2 36 gt QQ Plot Result Window It is possible to show the Plot before and after Normalization in a same window or separated window according to set up 2 3 6 Correlation Scatter Plot On the menu bar select Statistics Plot Correlation Scatter Plot or just click on the fourteenth icon re ol Anai huida 2 log mb Peres Dala D cp ona D 2b 10 10 Y c 10 11 D cb 70 08 267010 3 207 14 DY EEN ue L Dir 1D 10 DY Em 0 N EEN Te L Gry i DY Em D a 5 4 Monmalined Daba L cobos D 2010 10 O 261011 D Eb 750 D 207010 DO corp D emoa D Seiad 9 Dir 1011 D e
133. wer Tools APT for RMA Plier and other calculations http www affymetrix com support developer powertools index affx 19 NW Gene ST array Preprocessing Result GenPlex v3 Preprocessing Module D GenPlex_Hetail_Test My AnalysisWPreprocessingWGenPlex v3 0_Test mlb File Preprocessing Statistics Plot Analysis Data Window View Other Tools Help ARR 508 Dah E laleli kA Bas a A Analysis 4 npe_wt_sham GenPlex 3 0_Test mib Before Normalization mf npe wt sham Y npe_sham 4wk Y npe_sham 6wk C npe_ywt_tp Probe_ID Detection Detection P value C npe_tp 4wk 1 10593927 Gene Level Analysis Before Normalization MoGene 1_0 st v1 Method RMA The data was log transformed with base 2 Normalization None PM Intensity Adjustment PM only TD L npe_tp 6wk 2 10344302 After Normalization 3 10340373 iO mpc_wt_sham 4 10457733 C npe_sham 4wk 5 10539444 Y npe_sham 6wk 6 10424411 C npc_wt_tp 7 10362379 Y npe_tp 4wk 8 10461640 Y npc_tp Ewk 9 j10578690 40 10399760 41 10442786 12 410340958 13 10470665 14 10364009 45 10339756 16 10347106 47 10338382 48 10583390 49 10382852 20 10474004 21 0361834 22 10517443 23 10451856 24 10395692 25 10452172 26 10566512 27 10581395 28 10555590 29 10338879 30 10600884 31 10375975 32 10513824 ae 33 10496262 nalysis og 34 110548455 gt GuiCont onPerformed Open And 35 10442294 a Se eee 0 00025
134. xpectation of effective classification result Uni Variate Approach is generally used in gene selection but also the Multi Variate Approach is adopted to complement the correlation between genes which is not considered in the former method In Multi Variate Approach the dimension decrease methods like PCA or SVD are used generally 147 9 2 Classifier If the gene selection is done this will be basic to classify samples Classifying the sample this way is the Classifier there are various methods from Fisher s Linear Discriminant Analysis FLDA which is used in general traditionally to Support Vector Machine SVM which is the most recent way and artificial neural network Let s try to understand Classifier through the figure below Red circles are the samples of cluster 1 and black squares are the sample cluster 2 Now let s draw a line of boundary between two groups Following this boundary samples in the future will be classified Then how do we know that we have drawn the boundary properly to divide the field of two groups Among dotted line and solid line which boundary is more convenient to classify cluster 1 and 2 Classifier is the answer to these kinds of questions 148 9 3 Generalization Error Estimation This 1s not the part which actually operates the Classifier but this can be the most important part in Classification analysis for Error Estimation judgment standard The core of Classification ana

GenPlex Introduction

Contents

Download Pdf Manuals

Related Search

Related Contents