Home

Carrak - User Manual

1. 0 02 8 0 282 5 0 210 0 017 0 170 8 0 210 0 017 0 262 0 170 3 210 0 017 16 0 318 16 0 219 0 019 0 165 16 0 219 0 019 1 D 16 219 0 019 32 0 227 32 0 165 0 023 0 130 32 0 165 0 023 0 227 0 130 32 0 165 0 023 64 0 209 64 0 152 0 020 0 120 64 0 152 0 020 0 209 0 120 64 52 0 020 128 0 218 128 0 165 0 019 0 135 128 0 165 0 019 0 216 0 135 128 0 165 0 019 256 182 256 0 152 0 021 0 135 256 0 152 0 021 0 162 0 136 256 52 0 021 512 g 0 200 512 0 152 0 019 0 125 512 0 152 0 019 0 200 0 125 512 0 152 40 019 1024 0 155 0 021 0 200 1024 155 0 021 0 130 1024 0 155 0 021 0 200 0 130 1024 55 0 021 2000 0 155 0 023 19 2000 0 155 0 02 3 2000 0 155 0 023 0 191 0 135 2000 5 0 023 Class Error Rates Class Error Rates Class Error Rates Class Error Rates Class 1 Tumor 1 Class 1 Tumor Class 1 Tumor O Class 1 Tumor F Class 2 Norma Class 2 Norma Class 2 Normal 1 Class 2 Normal Figure 6 5 Classification errors Class error rates GENES SELECTED Top Genes Frequency of selection of the genes among the folds for a subset of 64 Genes selected show 15 genes 1 Or show rows Frequency Genes selected 0 96 gene1772 0 94 genel353 geneU 92 gene1924 0 90 2 genes at next level Gene description Mouse over a gene to get more information 2 gene0223 selected in the subset of gene in 0 of the folds Figure 6 6 Summary view Genes Selected 1 The user can navigate through the genes selected for
2. Balanced Balanced Balanced Balanced Balanced Figure 6 17 Deletion of an experiment 20 2 repeats 2 repeats 10 repeats 10 repeats 2 repeats 2 repeats 2 repeats 2 repeats 2 repeats 2 repeats 1 repeat 1 repeat 1 repeat X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample X Classify new sample Classify new samples Specification of Options Figure 6 18 Form to classify new samples Classify new samples Class predictions va1 Normal V40 Tumor V43 Normal V42 Normal V45 Normal V44 Normal VAT Normal V46 Normal V49 Normal VAB Normal V23 Tumor V22 Tumor V21 Tumor Figure 6 19 Result of the classification of new samples 21 Chapter 7 Conclusions This document gave a detailed description of the Carrak software If you have any comment or need more information please do not hesitate to contact us at Rmagpie gmail com 22 Bibliography Ambroise and McLachlan 2002 Ambroise C and McLachlan G 2002 Selection bias in gene extrac tion on the basis of microarray gene expression data Proceedings of the National Academy of Sciences of the United States of America 99 10 6567 6572 Burman 1989 Burman P 1989 A comparative stu
3. displays the number and the names of the genes selected for each fold in each repeat Two tools pointed out in figure 6 12 allow the user to interact with the genes table 1 Arrow buttons to navigate through the repeats similar as the one on the frequency table of the first Layer 14 Gene description Mouse over a gene to get more information gene0305 selected in the subset of gene in 0 of the folds Figure 6 9 Visualization of the frequency of the genes Two layers of Cross Validation Smallest error rate after correction for bias 0 175 40 010 obtained by 2 layers of LY More details Figure 6 10 Classification Errors Second layer of CV 2 By default only 10 genes are displayed in each row all genes allow the user to have a look at all the genes selected for the current fold in the given repeat Figure 6 13 gives an example where the first row has been expanded First layer of cross validation The First layer of cross validation is very similar to the summary view The differences are pointed out in figure 6 14 1 The graph now plots the error rate versus the number of genes for each repeat of the first layer of cross validation 2 A radio button allows the user to switch views and display either the results of the aggregate summary first layer CV or of any of its repeats When the user decides to select the first repeat instead of the aggregate view the error rate table changes to re
4. gene 0 72 genes genelogs 0 68 3 genes at next level Figure 6 7 Top genes Show the frequency of 20 genes only Show __ genes Go Crshow 3 rows Frequency Genes selected 0 56 gene O44 qeneUss3 denell de qgene1424 0 90 genell on genel0S4 Figure 6 8 Top genes Show the frequency on 3 rows only Detailed view As stated before the detailed view can be accessed by selecting the Detailed view link at the top right corner of the Experimental results section Classification errors The Classification errors part is a bit more complicated as the one from the summary view First this part is divided in two subsections entitled Second layer of Cross Validation and First layer of Cross Validation Second layer of cross validation By default cf figure 6 10 the second layer of cross validation is no further detailed but the user can click on More to get the information concerning the performances on each fold of the outer layer Figure 6 11 present the expanded view of the second layer of CV and outlines four points of interest 1 Less details allows the user to get back to the compact second layer view 2 A graph plots the error rate and corresponding best number of genes or threshold for each fold and each repeat 3 A table summarizes the error rate for each repeat and the corresponding average best number of genes or threshold 4 A second table at the bottom
5. genelzdl which one to choose the best size of best threshold is probably a good trial e When you are ready click on the Classify button genel229 genelliz genell353 genesi genelO s genel196 genell40s genels46 of the new samples are provided If they are not then they will Figure 6 19 is an example of view of the results after classification of new samples 17 Repeat 3 Fold No Of Genes selected Genes gene0014 gene0044 genell124 gene0141 gene0142 gene0164 genel 1 5 gene0189 genell21 1 genellazd El 10 genes only gene0249 gene0276 geneU2oU gene03 3 gene0377 gene0591 gene0419 gene0427 genelldda genelldda gene0S 10 gene0516 genel5ad genel5 5 gene05 6 genel583 gene0611 genelb25 genet genees geneUbh4 geneUbo geneUbao genel732 denelaid gene0 47 gene0 52 genel oo gene0 92 genelol genelo2 genella2d geneloss genelo4 gene0680 gene0915 gene0425 genelldaE gene095 genelldEb 1 126 genelO05 genel006 gene1013 genel1027 gene1030 genelDda gene1060 genellaf genel094 gene1056 gene1110 genel123 gene1147 gene1210 gene1221 gene1224 gene1226 gene1231 genel241 gene1243 genel25b genels25 gene1346 gene1347 gene1346 gene1360 gene1366 geneli genel400 gene1420 genel423 genel440 genel465 genel4bb genel4o2 gene1494 gene1501 gene1516 gene1530 gene1544 gene1550 gene15 0 genel5 9 genel5 z genel59 genelb22 genelb25 genelb41 genelbbe gene
6. Genes Display Genes selected Summary Repeat 1 Repeat 2 1 Repeat 3 Repeat 4 Repeat 5 Frequency of selection af the genes among the folds for a subset of 64 Genes selected show 10 genes Go Or show rows Frequency Genes selected 0 56 gene 0 94 genel353 gene0792 gene1924 0 90 gene genel094 0 86 gqenels O 0 56 genel400 0 54 qenelo43 0 52 2 genes at next level Gene description Mouse over a gene to get more information gene0213 selected in the subset of gene in 0 of the folds Figure 6 15 Detailed view Genes selected 19 Top Genes Display Genes selected Summary Repeat 1 Repeat 2 Repeat 3 Repeat 4 Repeat 5 Frequency of selection of the genes among the folds for a subset of 64 Genes selected show genes Orshow rows Frequency Genes selected 1 00 0 90 0 60 0 70 0 60 0 50 0 40 0 30 0 20 DATASET Select a data set gene0353 gene1570 gene0124 gene0576 gene0175 gene0 56 genel37 gene0611 E all genes gene0044 gene0164 gene0211 gene0249 E all genes gene0456 gene0493 all genes gene0099 gene0224 E all genes 30 genes at next level Figure 6 16 Detailed EXPERIMENTS AVAILABLE Q No A No Q No Q No Q No Q No Q No Q No Q No Q No Q No A No Q No 15 02 Jul _ 17 03 Jul 21 08 Jul 22 08 Jul 25 09 Jul 27 09 Jul 28 09 Jul 29 09 Jul 30 09
7. Jul 31 09 Jul 37 11 Jul 36 11 Jul 39 11 Jul gene1772 gene0792 gene1360 gene0654 gene0560 gene0516 gene0523 gene0350 vant Veer 70 genes M icrosoft Internet Explorer Support Vector Machine Support Vector Machine Support Vector Machine Nearest Shrunken Centroi Support Vector Machine Polynomial kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Nearest Shrunken Centroid Nearest Shrunken Centroid gene1643 gene1094 gene1622 gene0662 gene0739 gene0576 gene1325 gene0445 gene1924 gene1346 gene1597 gene0966 gene0795 gene0663 gene1366 gene0625 gene1400 gene1030 gene1221 gene1241 gene1466 gene1462 gene1060 gene1256 gene1579 gene1671 gene1873 gene1006 gene1110 gene1123 gene1231 gene1423 gene1549 gene1582 gene1641 gene1650 gene1780 gene0733 gene0 34 gene0 52 gene1027 gene1210 view Genes selected for the first repeat 2 Are you sure you want to delete experiment 21 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 3 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 3 folds 10 folds 9 folds TEFT ed Balanced Balanced Balanced Balanced
8. 11 Summary view This view is divided in two parts at the top under the title Classification errors are the information concerning the error rate and at the bottom under the title Genes selected the one regarding the genes selected by feature selection during the classification process Classification errors We will first have a look at the Error rate part Figure 6 4 presents the five interesting points displayed in this view 1 The biased corrected error rate obtained by two layers of cross validation 2 The best number of genes and corresponding error rate obtained by one layer of cross validation 3 A graph showing the one layer cross validated error rate versus the number of genes in the subset usually displayed as logarithm of the number of genes 4 A table summarizing the cross validated error rate versus the number of genes in the subset and optionally the error rate per class 5 These check boxes allow the user to display or hide the class error rates Figure 6 5 give an example of the possible displays Experimental results Detailed view CLASSIFICATION ERRORS Smallest error rate after correction for bias 0 176 0 018 obtained by 2 layers of CV 1 Smallest error rate 0 152 0 020 with 64 genes obtained by 1 external layer of CV 2 4 No of 3 genes CV Error rate Class 1 Class 2 Ext CV Error rate vs number of selected genes 1 0 255 0 025 0 409 0 170 2 0 255 0 025 0 40
9. 64 all genes gene0014 geneDaOdd all genes genes gene0TO0 all genes geneUbbs denelDOED genelib4 genesia all genes genel1zd 2 gene0011 gene 24 geneladb genel1zsd genella2d geneldA genell12d genelzdd genels o Figure 6 12 e Specify whether of not the names be created automatically e Specify if you want to normalize to a mean of zero and a standard deviation of one the levels of gene expression of the new samples You probably want to do the same treatment as the one you did on the original data gene0141 gene0014 genelib4 genel440 genelllb4d genell24d genel1DU genell141 gene1325 genelb11 Genes selected gene0142 genelib4 genell1 5 qene001s qene017s genelbbo genel 1 5 genells b geneli sd genell1 3 genell oa geneig genel202 genelods genel4g geneUbbs genell4i genes genell dd genel02s gene 11 genel920 genel E genel 5g genet 42 gene 11 genelOb0 genell1od gene0029 geneli geneldzd genelliEl denel d genelllb4 denell24d genel25b e Specify whether each row corresponds to a gene or a sample e Finally choose the size of subset or the threshold that you want to consider If you don t know Second layer of CV Genes selected genel11 genel0sd genellaal genelsss genelOan geneli 5 genell3a3
10. 9 0 170 4 0 223 0 027 0 309 0 175 6 0 210 0 017 0 262 0 170 16 0 219 0 019 0 318 0 165 32 0 165 0 023 0 227 0 130 5 64 0 152 0 020 0 209 0 120 z w 126 0 165 0 019 0 216 0 135 O 256 0 152 0 021 0 182 0 135 cea Pain Poh REELE eo 512 0 152 0 019 0 200 0 125 a ma EE MI N EA n MT N ANN 7 1421 a 1024 0 155 0 021 0 200 0 130 E 2000 0 155 0 023 0 191 0 135 E 2 Class Error Rates Class 1 Tumor E LU Class 2 Normal 5 e Ext cy error log2 Number of genes Figure 6 4 Summary view Classification errors Genes Selected The second part of the view displays the gene selected during the classification and their relevance to the given outcome Figure 6 6 points out the two different visualization tools to display the frequency of selection of the genes among the subsets and repeats 1 A table where the genes selected are ordered by frequency for each size of subset or threshold 2 A microarray like image Diverse tools enable the user to adapt the display to his need 12 No of No of Re CV Error rate Class 1 ee CV Error rate Class 2 gure CV Error rate Class 1 Class 2 Ee CV Error rate 0 409 1 0 255 0 025 0 170 1 0 255 0 02 0 409 170 0 02 2 0 409 2 0 255 0 025 0 170 2 0 255 0 02 0 409 170 2 2 4 0 309 4 0 223 0 027 0 175 4 0 223 7 0 309 175 4 223
11. Carrak User Manual Camille Maumet July 18 2008 Contents 1 Introduction 2 Installation and dependencies 2 1 Dependencies 4 2 2 Installation ooo 3 Home page 4 Data sets 4 1 General Presentation 4 2 Upload a new data set 4 3 Delete an existing data set 5 Classification 6 Experiments 6 1 General presentation 4 6 2 Consult an experiment 4 4 4 6 2 1 Options ee 6 2 2 Experimental Results 6 3 Delete an experiment 6 4 Classify new samples 7 Conclusions Bibliography 10 10 10 11 11 16 16 22 22 Chapter 1 Introduction This software provides an interface to train classifiers and to estimate the predictive error rate of classi fiers using external one layer and two layer cross validation These two cross validation techniques have been presented respectively in Ambroise and McLachlan 2002 and Stone 1974 Wood et al 2007 Zhu et al 2008 One layer external cross validation can be used to determine a nearly unbiased es timate of the error rate in the context of feature selection The number of features to be selected is specified and f
12. alidation with various types of classifiers and feature selection estimate the predictive error rate on a dataset e Access your previous experiments and display them e Create a classifier which can be used to classify new data Classification of microArray data and error Rate estimate An effective way to determine the error rate of state of the art classifiers on your microarray data 1 Upload your dataset About Carrak Mormalize and organize vour data ER oe ME Seprona Carrak is free user friendly software designed for both biologists and statisticians It offers the ability to train a classifier on a labelled microarray dataset and to then use that classifier to predict the class of new observations data and get 4 ready to use data set 2 Classify range of modern classifiers are available including oe df support vector machines 5YMs nearest shrunken os F i Find an accurate Error rate AR NE AOA Aduanan AA ON EE N da Ys E 2 centroids NSCs Advanced methods are provided to ere Choose your classifier and perform ES Er se estimate the predictive error rate and to report the cross validation to get the error rate RE Me GE on vour dataset subset of genes which appear essential in discriminating between classes APADO 3 Analyse Collect the results and analyse 4 friendly interface to access to previous experiment consult your results and draw your own conclusions Home My datasets Clas
13. d Figure 6 11 Second layer of CV Expanded view 6 3 Delete an experiment To delete an experiment click on the red cross corresponding to the one you want to delete from the list of experiments As displayed in figure 6 17 a message wait for your confirmation before deleting the selected experiment Click on ok to confirm the deletion or on Cancel if you want to go back the the list of experiments without deleting the selecting experiment 6 4 Classify new samples In order to classify new samples you must first select an experiment which has already been performed You can access to the page of classification of new samples by clicking on the link Classify new samples corresponding to the chosen experiment Figure 6 18 displays the form that you have to fill to classify new samples The view is divided in two parts 1 The first frame described the options of the experiment selected 2 The second part contains the form You must fill the form to classify one or more new samples by giving the following pieces of information e Give the path and the name of the file containing the levels of gene expression of the new samples that you want to classify 16 Repeat 3 Fold Ke 1 No Of Genes 128 1024 hd a Ed 32 128 64 gene0014 geneDaOdd all genes geneU00s gene000E all genes gene0014 gene0099 all genes gene0S5 gene0 92 gene0044 gene00S4 all genes gene0014 gene01
14. different sizes of subsets or thresholds by clicking on these arrows 2 Show X genes allow the user to select the number of genes he wants to be displayed in the frequency table By default a maximum of 100 genes are displayed Figure 6 7 give an example where the user wants to see at most 20 genes 3 show X rows allow the user to select the number of rows he wants to be displayed in the frequency table Figure 6 8 give an example where the user wants to see 3 rows only The last part of the Genes selected section presents a visualization of the frequency of selection of each genes by the feature selection method It is divided in two parts highlighted in figure 6 9 1 A microarray like image helps the user understanding the importance of each gene in the process of classification The brighter is the dot the more influent is the corresponding gene 2 A small description is available by mousing over a gene To generate the microarray like figure the genes are ranked according to their frequency among the folds and the repeats Then each gene is represented by a dot the brightest dots correspond to the most frequent genes 13 show genes Or show rows Frequency Genes selected 0 56 genel 2 0 94 genells53 qgenel792 qgene1424 0 90 geneUoo genellad 0 00 genels 0 0 56 gene1400 0 54 genelods 0 52 genell 5 genel sb 0 50 genel5 b 0 75 gene0611 genel346 genel 40 0 76 qeneUdhb
15. dy of ordinary cross validation v fold cross validation and the repeated learning testing methods Biometrika 76 3 503 514 Guyon et al 2002 Guyon I Weston J Barnhill S and Vapnik V 2002 Gene selection for cancer classification using support vector machines Machine Learning 46 1 3 389 422 Stone 1974 Stone M 1974 Cross validatory choice and assessment of statistical predictions J R Stat Soc Ser B 36 111 147 Tibshirani et al 2002 Tibshirani R Hastie T Narasimhan B and Chu G 2002 Diagnosis of multiple cancer types by shrunken centroids of gene expression Proceedings of the National Academy of Sciences of the United States of America 99 10 6567 6572 Wood et al 2007 Wood I Visscher P and Mengersen K 2007 Classification based upon gene expression data bias and precision of error rates Bioinformatics 23 11 1363 1370 Zhu et al 2008 Zhu J McLachlan G Ben Tovim L and Wood I 2008 On selection biases with prediction rules formed from gene expression data Journal of Statistical Planning and Inference 38 374 386 23
16. eature selection is performed on a set of training folds with the error rate estimated across the test folds This procedure is repeated for each number of features to be selected As an output of this one layer cross validation the user gets a cross validated error rate per size of subset However if the user wants to know the smallest estimated error rate over all the subsets considered then a second layer of cross validation is required to estimate the effect of this choice This document describes how to install this software and make the best use of its functionality Chapter 2 Installation and dependencies 2 1 Dependencies This software depends upon the following other software that has to be installed already please check that all of these are installed before starting Carrak e R with a version later or equal to 2 6 2 currently not available for version 2 7 1 e RMagpie R package available at http www maths ug edu au bioinformatics rmagpie html 2 2 Installation Please follow the instructions corresponding to your platform For Windows download the zip sources from and unzip the folder right click and choose for example extract here Then simply double click on carrak exe Chapter 3 Home page The home page as depicted in figure 3 1 gives access to the main functionalities of our dataset From here you can e Access your datasets and upload new datasets e Run one layer and two layer cross v
17. ection for bias 0 176 0 016 obtained by 2 layers of CV Smallest error rate 0 152 0 020 with 64 genes obtained by 7 external layer of CV Figure 6 2 Results of an experiment 6 2 1 Options The options are displayed in a frame at the top of the window It includes all the information depending on the experiment currently displayed e Dataset name and short description number of samples and genes e Classifier name e Number of folds in the outer 2nd layer and inner 1st layer layers e Number of repeats of cross validation e Different sizes of subsets or thresholds tried during the gene selection The Choose another experiment link brings the user back to the window displaying the list of the experiments where he can select another experiment 6 2 2 Experimental Results This part is divided in a Summary view and a Detailed view that are accessible from the links Summary view and Detailed view located at the top of the results part cf Figure 6 3 By default the Summary view is displayed Section 6 2 2 will describe the summary view and section 6 2 2 present the detailed view Experimental results Detailed view CLASSIFICATION ERRORS Smallest error rate after correction for bias 0 175 0 016 obtained by 2 layers of CV Smallest error rate 0 152 0 020 with 64 genes obtained by 1 external layer of CV Figure 6 3 Experimental results Choice between summary or detailed view
18. et Click on ok to confirm the deletion or on Cancel if you want to go back the the list of data sets without deleting the selecting data set Microsoft Internet Explorer 2 bre you sure vou want to delete dataset 17 Figure 4 3 Deletion of an existing data set Chapter 5 Classification By clicking one the Classify Gene Expression entry of the menu you get form as displayed in figure 5 This form will allow you to perform a one layer and a two layer cross validation by a simple click New experiment Step1 Specification of Options DATASET Dataset b CLASSIFICATION OPTIONS Classifier Number of repeats 2 FEATURE SELECTION OPTIONS Method of feature selection b FOLDS Number of folds in the external layer 10 j Number of folds in the internal layer 9 h Type of fold creation Balanced v Figure 5 1 List of your data sets Let s have a closer look at what pieces of information you need to fill this form First you have to choose on which data set the cross validations will be applied If no data set appears in the selection box that means that you need to upload a data set before To do so please have a look at section 4 2 Second you can choose between two classifiers namely the Support Vector Machine with the Recursive Feature Elimination proposed in Guyon et al 2002 or the Nearest Shrunken Centroid presnted in Tibshirani et al 2002 If you have chosen the Sup
19. flect the error rate of the first repeat only Genes Selected Figure 6 15 shows the display of the Genes Selected part in the detailed view It s completely identical to the summary view excepted for a radio button at the top that allows the user to select which repeats or aggregate view he wants to display What happens when the user decides to display the genes selected in a repeat is shown in figure 6 16 1 First the frequency table is updated to show the genes selected during the first repeat only 2 Second the microarray like image is removed since it represents an overall summary of the impli cation of the genes in the outcome 15 Two layers of Cross Validation Smallest error rate after correction for bias 0 176 0 018 obtained by 2 layers of CV 1 ElLess details 2 Rep No Of Genes CV Error rate AM a da Error rate per fold in the 2nd Layer of CV vs subset si 2 114 0 176 0 037 3 153 0 210 0 049 3 7 4 206 0 173 0 035 5 363 0 154 0 043 repeat 1 repeat 2 repeat 3 repeat 4 repeat 5 0 5 0 4 0 2 Error rate per fold in the second layer of CV 0 1 0 3 0 0 0 2 4 6 8 10 log2 Number of genes Top genes Repeat 1 4 Fold No Of Genes selected Genes gene0044 gene0047 gene0072 gene0115 gene0129 gene0164 gene01 5 gene0169 gene0211 gene0226 l 230 all genes qene0014 gene0099 qgene0111 gene0124 gene0145 gene0164 aeneD173 gene0175 gene0176 aeneD1a
20. l presentation By clicking one the My Experiments entry of the menu you get a list of the experiments currently available as displayed in figure 6 1 My Experiments DATASET Select a data set van t Veer 70 genes vl 1 EXPERIMENTS AVAILABLE O 3 A No 15 02 Jul 2 No 4 No _17 03 Jul 21 08 Jul 1 No A No 2 No 22 08 Jul 25 09 Jul 27 09 Jul 2 No 29 09 Jul 26 09 Jul No 2 No _ 30 09 Jul A No 31 09 Jul Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Nearest Shrunken Centroid Support Vector Machine Polynomial kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Support Vector Machine Linear kernel Figure 6 1 Home page From this view you can 1 2 5 2 folds 3 folds 3 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds 2 folds original original Balanced Balanced Balanced Balanced Balanced Balanced Balanced Balanced 2 repeats 2 repeats 10 repeats 10 repeats 2 repeats 2 repeats 2 repeats 2 repeats 2 repeats 2 repeats xxxxXxxXxXXXXX A 5 new sa
21. lE l genelbrd genel 2b genel 27 genel 40 genel 56 genel 5 genel 69 genel 2 genel 60 genelsia genelo23 genelosb genelo43 genelgr0 genel genelo75 gene1692 geneladi genelo94 genelouy gene1S09 gene1916 geneld genelS24 genel935 gene1956 genel9o3 gene1995 geneUU0s genelODOE gene0011 gene0014 gene0015 gene0016 gene0025 gene0029 gene00S0 gene0032 2 1024 El all genes gene0014 genelOsd genell124 gene0164 gene0175 gene0202 denel211 gene0249 genellisl gene0353 4 64 E all genes Figure 6 13 Second layer of CV Genes selected expanded One layer of Cross Validation Smallest error rate 0 152 0 020 with 64 genes obtained by 1 external layer of CV 1 Display Error Rates Summary Ext CV Error rate vs number of selected genes Repeat 1 O bite Ext cv error O Repeat 3 repeat 1 Repeat 4 repeat 2 Repeat 5 5 ase w repeat 5 p CV Error rate Class 1 Class 2 s 1 0 255 0 025 0 409 0 170 2 0 255 0 025 0 409 0 170 4 0 223 0 027 0 309 0 175 E 8 0 210 0 017 0 282 0 170 16 0 219 0 019 0 318 0 165 32 0 165 0 023 0 227 0 130 64 0 152 0 020 0 209 0 120 e 128 0 165 0 019 0 218 0 135 ui 256 0 152 0 021 0 182 0 135 512 0 152 0 019 0 200 0 125 1024 0 155 0 021 0 200 0 130 2000 0 155 0 023 0 191 0 135 Class Error Rates Class 1 Tumor Class 2 Normal log2 Number of genes Figure 6 14 Classification errors First layer of Cross Validation 18 Top
22. mple new sample new sample Classify new sample Classify new sample Classify new sample Classify new sample Classify new sample Classify new sample Classify new sample Classi Classi Classi Access to the list of experiments made on a given data set by selecting a data set Look at the list of all the experiments available for the selected data set Consult an experiment by clicking on its identifier or on the magnifying glass Delete an experiment by clicking on the red cross Classify new samples by clicking on the corresponding link The three last actions are described in the next sections 6 2 Consult an experiment The window displaying the results of an experiment is divided into two parts 10 e The Options of the Experiment section reminds the user of the options used to compute the classification Classifier name dataset e The Experimental results describe the results obtained by two layer and one layer cross validation Both parts are described in further details in section 6 2 1 and 6 2 2 Figure 6 2 presents an overview of the interface Options of the Experiment Alon log 62 samples 2000 genes Choose another experiment Support Vector Machine linear 9 Folds in the 1st Layer amp 10 Folds in the 2nd Layer 5 repeats sizes of gene subsets From 1 to 2000 1 2 4 8 16 32 64 128 256 512 1024 2000 Experimental results Detailed view CLASSIFICATION ERRORS Smallest error rate after corr
23. on of each gene for each sample e Specify whether or not you want your data to be normalized as a normal distribution with mean zero and standard deviation one e Usually each row in your gene expression file corresponds to a gene and each column to a sample If it s not the case you must answer No to the question Does a column correspond to a sample and a row to the expression levels for a given gene e Give the path and name of the file containing the class label of each sample e Specify whether the class labels are order on a row or a column in your class label file e When you are ready click on Create New dataset Step1 Specification of Options DEFINITION Name of the dataset Description DATA FILES Is there a header providing the names of the samples in both data files Yes O No Does the first column or row provide the names of the genes Yes O No Gene expressions file Normalize the gene expressions O Yes No Does a column correspond to a sample and a row to the expression levels for a given gene Yes O No Class labels file Are the sample output classes presented on a row Yes O No Figure 4 2 Upload a new data set 4 3 Delete an existing data set To delete an existing data set click on the red cross corresponding to the one you want to delete from the list of data sets As displayed in figure 4 3 a message wait for your confirmation before deleting the selected data s
24. port Vector Machine you need to choose the kernel In a first approach the Linear is probably the most appropriate Also it s the fastest kernel to be processed e You also need to choose the number of repeats Each cross validation one layer or two layer will be performed several times according to the number of repeats given here The results are then averaged over the repeats It is believed Burman 1989 that this method can improve the accuracy of the estimator of the error rate e You then have to choose the sizes of subsets for RFE or the thresholds for NSC that will be tried By default subsets from size one to one to the number total of features by powers of two are tried for the RFE or 30 thresholds selected by the pamr package for the NSC You can also select you own sizes or thresholds e Then choose the number of folds that you want in the outer layer of two layer cross validation and for one layer cross validation The default value is 10 e Choose the number of folds that you want in the inner layer of two layer cross validation The default value is 9 e The last step is to specify the kind of division in fold that you want Two options are possible a simple way or balanced folds as presented in CITATION MISSING When you are ready click on classify and wait until your experiment is finished The results are then displayed as presented in section 6 2 Chapter 6 Experiments 6 1 Genera
25. sify Gene Expression My Experiments Contact Us Template World Figure 3 1 Home page Chapter 4 Data sets 4 1 General Presentation By clicking one the My Datasets entry of the menu you get a list of the data sets currently available as displayed in figure 4 1 My Datasets CREATE A NEW DATASET 1 DATASETS AVAILABLE WwW 1 van t Veer 70 genes vant Veer 70 genes A 17 Hedenfalk Hedenfalk s dataset no normalization Figure 4 1 List of your data sets The list display the identifier of each data set its name and description From here you can 1 Upload a new data set by clicking on CREATE A NEW DATASET 2 Delete a data set by clicking on the red cross These actions are described in the next sections 4 2 Upload a new data set To upload a new data set you must fill the form displayed in figure 4 2 and give the following pieces of information e Choose a name for your data set e Write a small description e Specify whether or not the names of the samples are available in the files Be careful if the names of the samples are given they must appear in both files If they are not they will be automatically generated as V1 V2 e Specify whether or not the names of the genes are provided in your gene expression file If they are not they will be automatically generated as genel gene2 e Give the path and name of the file containing the levels of gene expressi

Carrak - User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents