Home

User Manual

1. De fineAmplicons m If the file where to one is about to save the results already exists CGH Plotter will write the results after the existing text O Plot button CGH Plotter plots only the data that are seen in the data list box and uses properties that have been specified CGH Plotter shows a message box that gives genomic indices to the amplicons Name of the samples indices of the boundaries Amplicon boundaries message is modal and it will disappear permanently after pushing the OK button P Main Page button Main Page button takes one back to the main page A capture of the typical plotting figure is provided in Figure 14 which illus trates the ratios from chromosome 20 across five samples It is also possible to explore only one of the samples by illustrating it separately as shown in Figure 15 Amplicon deletion boundaries of the samples are listed in Fig ure 18 while Figure 19 illustrates the created ASCII file that reveals the properties of each amplicon and deletion 23 Chr 20 T ail SKBR3 MCF7 BT474 BT20 gt MDA361 Figure 14 Ratios from five samples chromosome 20 illustrated in one fig ure Amplicon boundaries are seen with the same color as the corresponding sample Combined amplicon boundaries are colored black Cumulative base palrs are in the x axis 24 BT474 chr 20 T T l 1 1 1 l 1 1 07 1 075 1 08 1
2. E MaN 1 017958E NaM E 1 E 26340 1 1 032050E71 03205009 65 3 20509 55320509 05039469 35 EN 1 Fl 337561 0 1 064994E Nah 9 97 5556E Mal 1 009504E Mal 1 8 300101 0 9 67 15035E Nah 9 067315E Mal 9 625 06E Mal 1 9 AD5506 0 1 122229E Nal 1 039436E NaN 1 067191E NaN 1 10 458205 211 096347E41 07017589 99737281 10198589 500078 1 02 1 11 465625 211 042725841 07017581 21461981 10190581 07233781 02 1 12 502575 0 1 116362E Nah 1 025793E Hal 970641 E Nah ts 1 13 654883 5 1 062576E 1 0005046 9 6918096 1 0 2727189 023406E 9 58 1 14 716281 5 9 409512E 1 0005046 1 02727181 0272 7189 4935438 9 56 1 16 751912 5 9 366 153E 10005046 1 025446 1 027271E9 583333E 9 58 1 16 76830 511 112354E 1 000504E 1 094181E 1 02727 1E 1 065174E9 58 1 17 05226 5 1 000504E 1 0005048 1 0541366 10272716 1 0217926 9 56 20 0676 0 1 035680E Na 9 655977E Na PATE M 4 Hh test 4 Ready A Figure 22 Example of resulting txt file D CGH Data Filtered Data DP Data and other Checkboxes User can choose any combination of active checkboxes in this section If checkbox is selected corresponding data is going to be printed into text file If none of the checkboxes is selected only basepair and chromosome infor mation will be printed If one or several of checkboxes are in inactive state it indicates that loaded data file does not contain that type of data 90 If Write interpolation info checkbox is selected 0 or 1 will be printed af ter every filter
3. View Insert Format Tools Data Window Help Acrobat Cee Ss 627 Eb 0 0 4z GE f Arial A E E O E F 1 CAROMOSOMEZD SKBRS MCF ET474 ET20 104 361 2 3 i Type Amplican Amplicon Amplicon Deletion Deletion 4 1 Number of clones g 10 g g 16 5 1 Start 10711 10710 10711 10695 10695 em 1 End 10715 10715 10719 10702 10710 r 1 Stat Basepair 3013370320 3013210324 3013370320 501 1240006 3011240006 a 11 End Basepair 3013370320 3013370320 3013370320 3011424506 3013218324 g 1 Height 1 7251 1 3247 1 5765 0 74567 0 9361 10 1 MaxMin 2 1341 1 4997 1 7485 0 57769 0 76154 11 12 2 Type Amplicon Amplicon Amplicon Amplicon Amplicon 13 2 Number of clones Aq 13 11 E g 14 2 Start 105030 10590 10030 10711 10711 15 2 End 10556 10902 10545 10715 10715 16 2 Start Basepair 3045696934 3052309073 30456969534 3013370320 3013370320 17 2 End Basepair 30502860636 3053426950 3046560263 3013370320 3013370320 15 2 Height 1 3474 1 2754 27 1995 1 730 1 3475 19 2 MaxsMin 1 0620 1 5692 2 9362 1 9591 1 5641 20 21 3 Type Amplicon Amplicon Amplicon Amplicon Deletion 22 3 Number of clones da E 4 Pale g 23 3 Start 10931 10941 10549 10535 10720 A4 End 11002 10946 10552 11045 10510 25 3 Start Basepalr 3057245592 3057912612 304661 2294 304554025 301 3559603 26 3 End Basepair 3001915962 3059967124 3046700719 3075760768 3043472679 27 3 Height 1 4607 3 1466 1 797 1 2099 0 9443 283 Maxiin 2 2667 6 93429 2 aaa 1 9555 0 70687 Ag Figure 19 The ti
4. analyzes data stores results CGH Plotter main page BP Convert adds new basepairs interpolates NaNs Plot Data loads analyzed data plots data saves results of analysis in ASCII file Write TXT loads analyzed data saves data to text file according to users selections Figure 3 Main tasks of CGH Plotter blocks 3 1 2 Create Data Struct In the page Create Data Struct one is able to create a data struct that consists of the CGH data and essential indices It is assumed throughout CGH Plotter that the data contain fields given in this section All the data has to be either 1 in Matlab mat format or 2 in tab delimited text txt format Examples of the files both formats are in folder CGH Plotter data_structs PES Make data struct select data and chromosome indices for the struct Cumulative base pairs and names for the samples are optional fields of the struct A Data B Chromosome indices C Cumulative base pairs D Mames forthe samples E cave as Create struct Main Page F G Figure 4 Create Data Struct window A Data button obligatory This button enables loading of the data file The data file is assumed to be m x n matrix where m is the number of genes and n is the number of the samples Furthermore it is assumed that the genes are arranged according to their genomic order from p telomere of chromosome 1 to q telomere of th
5. enables loading of the data struct made in phase Create Data Struct 12 A as Filtering A Load B Dave as Option Filter parameters Moving EF median G Filter window lenght in bps 200000 Load optional region info D Load PO Main Page F Figure 7 BP Filter window B Save As button Before starting the analysis user has to specify the name for the file where analyzed data will be saved Save As button opens a Save As dialog and the name and the location for the result file may be selected It is recommended that result files are stored in the folder ampli_data C Filter options Filter type selection allows user to decide what kind of filter will be used during the filtering process For now only options for filter type are BP Me dian filter and BP Mean Filter In Filter window length in bps field user can specify window length for chosen filter in basepair units Default option is 500000 13 D Optional region info Optional region info can be loaded by pressing Load button in this section With this info user can ignore gaps like centromere and telomere regions Data values outside these regions will have no effect on the filtering result Startpoints Endpoints El Microsoft Excel Book1 i 3 Z joj xj Edit Format Tools Data Window Help Adobe PDF f xX 10 BZU EE Er A 22 kd fe Fae a AA 50000 217200 267200 307562 36 562 357
6. filtering utilizing moving window the fil tered data are w 1 shorter than the original data In order to keep data sets ol in the same size as the originals CGH Plotter inputs w 1 NaNs at the end of each chromosome The chromosomes are filtered individually because for example otherwise values in the end of chromosome one would affect to the values of the chromosome two 1 The filtered data are saved and so it is possible to plot filtered data in the phase Plot data 32 User inputs CGH Plotter Data 2 a Chromosome indices gt reate struc Basepairs Names Find Amplicons BP filtering Find number of changes e Filtering Filtering k means Change function median filter a o Median Average filter gt median average clustering define the num constant window size in basepair units Window size window size clusters the ber of changes filters only outside known gaps data to three clusters Filtered data Clustered data Number of change Constant for computing DC levels Window size Known gaps BP convert Conversion to new basepairs interpolates missing values A interpolates only outside known gaps Analyzed amp Interpolated data in A Plot Data Write TXT ee Figure 23 Overall view of CGH Plotter The user inputs CGH data chro mosome indices basepairs and names of the samples in Create Struct phase CGH Plotter c
7. given for each of these phases 3 1 User Interface Pages 3 1 1 Main Page CGH Plotter is started with command CGH_Plotter providing that current directory in Matlab is CGH Plotter Main page is opened and CGH Plotter 4 is available for use as illustrated in Figure 2 101 xl Create Data Struct Find Amplicons BP Filter BP Convert Plot Data Write TXT Exit Figure 2 Main page of CGH Plotter The main page contains seven buttons Create Data Struct Find Ampli cons BP Filter BP Convert Plot Data Write TXT and Exit First the data struct should be constructed in the page Create Data Struct if it is not done already After the data struct is created and stored the analysis part is executed at the page Find Amplicons or at the page BP Filter The page BP Convert allows user to add new basepairs in filtered result and interpolate data values for them In the page Plot Data the analyzed data may be plotted and results of the analysis saved in ASCII file Finally int the page Write TXT user can write all data if needed into ASCII file Button Exit ends session and returns the user to the Matlab workspace The idea of the blocks in CGH Plotter is illustrated in Figure 3 Create Struct Creates data struct saves data struct Find Amplicons loads data struct analyzes data stores results BP Filter loads data struct
8. storing the analyzed data and it can be located arbitrary Folder ampli_data is initially empty The diagram of folders in CGH Plotter is illustrated in Figure 1 3 CGH Plotter a l ampli_math data_structs ampli_data Figure 1 Folders of CGH Plotter Folders gui and ampli math are subfolders of CGH Plotter data_structs and ampli_data can be located arbitrary 3 Instructions Basically CGH Plotter functions as follows First CGH Plotter filters the data using median or mean filter with window size that has been input Sec ondly the filtered data are clustered using the k means clustering algorithm The purpose of the k means clustering is to find the maximum number of amplicons deletions at each chromosome This number is required by the last phase dynamic programming which actually estimates the amplicons and deletions CGH Plotter saves the result file which consists of the orig inal data filtered data probable amplicons and deletions indices to the changes of amplicons and deletions of the CGH data names of the samples cumulative basepairs and genomic indices To be more precise CGH Plotter consists of five phases 1 CGH Plotter creates a data struct of separate data files that the user has specified 2 CGH Plotter reads the data struct 3 CGH Plotter analyzes the data struct 4 CGH Plotter stores the analyzed data 5 CGH Plotter plots the data In this section a more detailed explanation is
9. window size should also be quite large gt 5 D Constant for computing the number of changes One may specify the constant that is used when the number of changes is computed Default constant is six The procedure how to compute the num ber of the changes along with some guidelines is given in Section 4 2 E Save As button Before starting the analysis one has to specify the name for the data struct to be analyzed Save As button opens a Save As dialog and the name and the location for the result file may be selected It is recommended that result files are stored in the folder ampli_data F Start button After providing all required information the analysis may be started by select ing the Start button Analysis of the data takes few minutes For example analysis of CGH ratios of 11994 genes from 14 samples with Intel Pentium ITI 2 4 GHz took approximately 5 minutes When CGH Plotter is ready a message box appears notifying that the data set has been successfully ana lyzed and results of the analysis are saved G Main Page button By pushing Main Page button one can return to the main page 3 1 4 BP Filter In BP Filter phase it is possible to filter data with filters that have constant window length in basepairs It is also possible to ignore filtering certain areas in data by specifying only special regions to be filtered These regions are given in separate input file A Load button This button
10. 085 1 09 1 095 1 1 Figure 15 Chromosome 20 of the sample BT474 CGH Plotter has now plotted each of selected data into different figures using genomic index CGH data is blue line amplicon boundaries red line NaN values of original data are now marked with crosses Underneath of the data is a bar where the amplicons and deletions of the data are marked with red and green bars HCC1428 chr 20 T T 2 5 MI 15 iii Wi hal j ii AMA Wife DA r Ao j Nif 1 jo AN m ir 0 5 1 l 1 1 3 02 3 03 3 04 3 05 3 06 3 07 Figure 16 Chromosome 20 of sample HCC1428 plotted against cumulative basepairs CGH data are seen in blue and filtered data as green line 29 SKBR3 chr All 1 2 3 4 5 6 7 8 9 10 11 12 134 1516 17 189 20222 X Y Figure 17 All chromosomes of sample SKBR3 CGH data are seen in blue and amplicon boundaries as red line CGH Plotter plots dividing lines be tween the chromosomes The bar below the data is indicating the amplicons and deletions Amplicon Boundaries SKBRS 10713 10722 10799 10840 10889 10933 11014 MEF 10712 10722 10892 10905 10943 10944 10949 10969 10976 10380 10983 110039 BT474 10713 10722 10840 10846 10855 10866 10908 10922 10945 109551 109569 10961 109394 10995 11027 BT20 10705 10713 10721 10759 10801 10837 MDA361 10713 10722 10813 EE Figure 18 Amplicon Boundaries message box 26 Ed Microsoft Excel Boundaries il File Edit
11. 502 407502 561231 611231 902347 952347 952347 1002347 1196421 1246421 1430675 14005675 1762474 18612474 1002622 1912622 2217008 226 000 3397365 acabo 451543 4565293 4913319 49753519 5105956 t H Sheet1 La F wiew Ins rt Lie L a O E A tm a Ready Figure 8 Example of the region info file Region info file must be either in matlab mat or in tab delimited text txt format If mat format is used then file must contain variable that is Nx2 matrix containing start point and end point in cumulative basepair units for every N separate regions Element i 1 of the matrix is considered as a start point for i th region and element i 2 as an end point for i th re gion Name of the variable in mat the file must equal to filename If txt format is used file must contain a matrix similar to one described above in tab delimited text format Structure of the region info file is illustrated in figure 8 14 If user does not load any special region info chromosome limits included in data struct will be considered as start points and end points for regions This means that each chromosome is filtered separately but other known gaps in data will not be ignored Figure No 1 Used window lengths in number of clones in each window E aj x Fie Edit View Insert Tools Window Help JOSH Sl LAASL ER 14 12 10 O 2000 4000 BO000 S000 10000 12000 14000 Figure 9 Exampl
12. CGH Plotter User Manual Contents 1 2 32 Crealo Dala E Se ep he A Sele o AAA A A he a Se Tt So SP CONVE e coa et eee os Ge ee Se He ee Introduction Installation 2 1 Installation Instructions 20220082008 4 Instructions 3 1 User Interface Pages 4 lt 635 24 48 wee Bae ow eo 3 1 1 Main Page ld BP Filter caemos ae e 3 1 6 Plot Data Sled Write TAXE e sr Methods A ti s to Ode 2 tor e Be a 4 2 k means Clustering 4 3 Dynamic Programming 4 4 Filtering according to basepair units Summary 31 31 34 30 36 3T 1 Introduction Copy number changes such as deletions and amplifications are common aberrations in cancer and are known to involve genes that play a crucial role in the development and progression of the malignant disease 5 The copy number changes span usually large regions of the genome and therefore in fluence multiple genes at the same time Comparative genomic hybridization CGH on DNA microarray allows simultaneous monitoring of copy numbers of thousands of genes throughout the genome 6 7 CGH Plotter is a versatile software that allows the user to plot CGH copy number data as a function of the position of the genes along the hu man genome and to rapidly determine the exact locations of copy number changes such as amplicons and deletions In this user manual we explain in details 1 How to install CGH Plotter 2 How t
13. D Remove lt lt E Plot Index to Gene FT Data l Cumulative base pair K FT Amplicon boundaries instead of genomic index Baseline T Combined amplicon boudaries method A T Median of the chromosome p L M Filtered data instead of value 7 Plot results T Combine amplicons and deletions l superimpose all data to one figure FR each plotto own figure Save boundaries Figure 13 Plot Data window 20 C Data type The CGH data can be plotted either as log transformed or as ratios If the the data is plotted as log transformed CGH Plotter adds 1 to the natural logarithm value in order to move the baseline to around ratio of one In every case amplicon deletion boundaries and filtered data are seen as ratios D Samples One may choose which CGH data sample he wants to plot If the last option All is selected CGH Plotter adds the selected chromosome of each sample to the data listbox E Chromosome In CGH Plotter one needs to select either the chromosome that he wants to illustrate or the option All when the ratios of the sample will be plotted genome wide F Add button After above mentioned attributes are selected Add button will take the facts of the data to the listbox on the right Data must always be exported to the data listbox because CGH Plotter handles only the data in listbox G Data listbox In the data listbox one can see the part or parts of the da
14. ave as 3 C Chromosome F rite EGH Data D F rite Filtered Data F rite DP Data Write E F serite Interpolation inte F serite BE Eilterwindow size in number ot clones Main Page F Figure 21 BP Filter window A Choose data button By pressing Choose data button user can select analyzed data or a data structure that will be converted to text file If selected file is not in correct format other options in the window will not be enabled If file is in correct format its name will appear on the right side of Choose data button B Save as button By pressing Save as button user can select filename for output text file If 29 filename is correctly selected it will appear on the right side of Save as button C Sample and Chromosome options User can choose one or all samples chromosomes to be printed in text file EA Microsoft Excel test tut E File Edit view Insert Format Tools Data Window Help AdobePDF Typeaquestionforhelp 4 X a aval 0 B7U H 9 8 El A gt A hi fe Chr ee ee ee 4 Chr IGenomicinBasepair Windowalze SkKBRS MCF MOAASb CGH Filt CGH Filt CGH Filt 1 1 107913 4 1 024650E 1 0159058 1 O60095E 1 00505361 165759E 9 651 1 2 106097 4 1 004434E 1 015985E 1 0334558 1 085053E 9 376531E9 651 1 31 120902 4 1 007356E 1 015985E 1 261760E 1 055055E 9 929579E9 651 6 1 4 171630 411 223330E31 0159958 1 0020158 1 09505389 26251089 b5 1 5 224500 09 709794E Nah 9 062247
15. ave as New basepairs Load regions ll Chromo limits Interpolate gaps inside regions shorter than o BPs otart G Main Page H o O Figure 11 BP Convert window A Load button By pressing Load button user can load data file that contains filtered data Data must have been filtered in either Find Amplicons or BP filter phase B Save as button By pressing Save as button user can select filename for output file If file name is correctly selected it will appear on the right side of Save as button C New basepairs button By pressing New basepairs button user can select new basepair info for loaded datafile see A File containing new basepairs must be in either mat lab mat or tab delimited text txt format If mat format is used the file must contain a one variable column vector that lists new basepairs in cumulative basepair units and in ascending order Name of the variable must 17 Ed Microsoft Excel Book1 gg pe ol x File Edit View Insert Format Tools Data Window Help Adobe PDF f X Ey arial 10 BZU E E Ee fe AA gt 1 7 fe 107913 105097 129902 171630 224500 276940 33561 3001011 ADBBODb 450205 465625 582575 b54003 162431 Eje 760 301 FO522b dbrb94 gt Moa ob a sheetl Sheet f Sheet3 al H Figure 12 Example of the file containing basepair info equal to name of the file For example basepairtest mat must contain variable called basepairtest If t
16. box with text Ready pops out G Main page button Main page button returns one to the main page Data struct can also be created manually However the struct must have the following fields e data_struct data CGH data size m x n data_struct chromo Indices to chromosomes size 25 x 1 data_struct basepair Cumulative base pairs size m x 1 e data_struct samples Names of the samples size n x 1 3 1 3 Find Amplicons Phase Find Amplicons involves several components The aim of this phase is first to find amplicons or deletions and then create a result file for plotting A Load data button This button enables loading of the data struct made in phase Create Data Struct B Selected data text box When the data have been selected the name of the data file can be seen in the text box next to Load data button C Filter parameters e It is possible to specify the type of the filter possible options are Move median and Move average By default CGH Plotter uses Move me dian filter 10 Find Amplicons me E EAN E F C Moving mesian Figure 6 Find Amplicons window 11 e Also the window size for filtering the data may be defined Default window size is five Window size is dependent on the amount of noise in the data When the amount of the noise in the data is small it is enough to have small window size e g 1 3 However if data are very noisy
17. e Y chromosome This order of genes is referred to as genomic index Missing values have to be replaced with NaNs Not a Numbers Finally the data should not be transformed e g with log transform prior to CGH Plotter After selecting a data matrix the name of the selected data appears to the text box next to data button B Chromosome indices button obligatory Y As it is essential to know where each chromosome begins the starting points of the chromosomes as indices to the data matrix needs to be specified Chro mosome indices is a 24 x 1 matrix First 22 indices are the starting points of chromosomes 1 22 23 rd is the starting point of chromosome X and 24 th of the chromosome Y Also the chromosome indices can be in txt or in mat format An example of chromosome indices matrix in mat format is shown below 3 ll 1338 2121 2829 3292 3901 4548 5ll5 5480 5924 6408 1047 129 1941 8393 8 01 9193 9812 9994 10695 11047 11198 11529 11980 Chromosome indices CGH Plotter adds the last index of the chromosome Y to chromosome in dices matrix Therefore the chromosome indices is a 25 x 1 matrix during the analysis C Base pairs button optional It is illustrative to plot the CGH ratios as a function of their actual location along the genome in base pairs Therefore we have included the possibility to define cumulative base pairs for the data Also the Base pairs file can be in mat or txt for
18. e of the plot after filtering process E Start button User can begin filtering by pressing Start button When filtering process begins or ends user will be notified by a message box After filtering is complete used true window lengths for each gene in data will be shown in a single plot see figure 9 Information on true window lengths can be useful when adjusting window length in basepairs Filtered data can be plotted at the Plot Data page Such plot is illustrated in figure 10 15 nl x File Edit View Insert Tools Window Help D ngal RA ASY PE Chr All 1 2 3 da la ff ie ee i A 12 114439617 1 204K Y Figure 10 Example of the plotted BP filtered result F Main Page button The Main Page button takes user back to the main page 3 1 5 BP Convert In BP convert phase it is possible to manipulate basepairs in filtered data file Basepair info in the file can be replaced with a new basepair structure given as a separate input If the new basepairs contain such pairs that are not present in original data file CGH and filtered data values for those pairs will be NaNs These and all other NaNs can have an interpolated data value if interpolation window is specified Interpolation process can be adjusted by giving as a separate input those regions in cumulative basepair units where interpolation is needed Also maximum length for interpolation window can be specified 16 lx Convert Load S
19. ed data value in output file 1 indicates that the data value is interpolated whereas O indicates that the data value is not interpolated If Write BP filter window size checkbox is selected true window sizes that were used in filtering will be printed for each clone index E Write button Writing to text file begins when Write button is pressed When writing pro cess begins or ends user will be notified by a message box Resulting text file is illustrated in figure 22 F Main Page button Main Page button takes user back to main page 4 Methods In this section we describe the methods used in CGH Plotter in greater detail The overall view for CGH Plotter is given in Figure 23 4 1 Filtering Before applying the k means clustering CGH ratios in each chromosome are filtered with the moving median or average filter The user may input the type i e mean or median and the size of window for the filter Suggested window sizes are between three and nine The filtering proceeds as follows First CGH Plotter computes the me dian average of first w values where w is the size of the window For ex ample if w is five the first value in the filtered data is median average of the first five CGH data points Then CGH Plotter takes again w values beginning from the second data point and computes the median average de pending on the user s choice The filtering stops when the last data point is reached Therefore in standard
20. er one should note that if the data are very noisy the user should try smaller constant in order not to detect noise instead of amplicons and deletions There are surely many other ways to determine the number of changes and in that case the user may want to modify the way the number of the changes is determined to the file Compute_kmean m 4 3 Dynamic Programming In this section dynamic programming is briefly explained More detailed presentation on dynamic programming can be found for instance from 4 In CGH Plotter it is assumed that copy number ratios can be approx imated with a constant and an error term As a consequence CGH data can be understood as a signal having constant levels and In essence there exists three kinds of constant levels base line amplicon and deletion levels and these are to be identified by the dynamic programming algorithm It is assumed that the number of the changes of constant levels c is known We use k means for this purpose as explained in previous section Assume that the CGH signal Ay A 2 Ag n n 1 n1 2 n2 aa S Ach n n 1 n 2 N is corrupted by noise Dynamic programming identifies constant levels A 4 42 43 Ac 1 and change points n Ny N1 N2 N3 Ne Ney where ny 1 and n 1 N by minimizing the function J A n ES 2 a n As T he idea of the dynamic programming is to find the shortest path from the value 1 to value z N Dynamic programming util
21. izes the Markov property which ensures that the distance between points x n and z n does not 30 depend upon which path was used at arriving to the point x n Therefore dynamic programming is capable for finding the minimum of J A n without checking every possible combinations of n1 n2 Ne In practice the procedure for identifying the constant levels proceeds as follows First constant levels are estimated A is the mean of the interval Inj 1 l n and Am at 1 nil n x n Ai Second function J A n is minimized over n using dynamic program ming li L mindz Ay Ni 1 nj min mind 2 A ni1 1 nl Ax re_1 1 nel min Ip 1 nx 1 Agl ng 1 LI This shows that the minimum error for the interval 1 L can be computed by adding the minimum error of the last segment to the error of the previous segments CGH Plotter stores constant levels A and indices to the change points of these levels 4 4 Filtering according to basepair units Filtering according to basepair units is almost same thing as normal filter ing according to clone indices T he main difference is that BP filter window size is constant in basepair units while normal filtering window size is con stant in the number of clones The filter window is chosen so that half of it is chosen from left side and another half from right side of the clone In the real data there are always locations where adjacent genes sho
22. ka nen M Chen Y Bittner M Kallioniemi A 2001 Comprehensive copy number and gene expression profiling of the 17q23 amplicon in hu man breast cancer Proceedings of the National Academy of Sciences USA Vol 98 pp 5711 5716 Pollack J Perou C Alizadeh A Eisen M Pergamenschikov A Williams C Jeffrey S Botstein D Brown P 1999 Genome wide analysis of DNA copy number changes using cDNA microarrays Nature Genetics Vol 23 pp 41 46 38
23. mat Base pairs file is an m x 1 vector where m is the number of genes If base pairs are not specified CGH Plotter will use only the order of the genes along the genome i e the genomic indices D Names of the samples button optional The names of the samples can be specified If names are given in mat format they should be given in n x 1 string vector where n is the number of samples Names cannot include space characters or special characters that Matlab considers as mathematical symbols like or For example if the number of samples is three the cell struct can be made and saved in Matlab as follows gt gt Names BT474 MCF7 ZR7530 gt gt save Names Names If names for the samples are not defined CGH Plotter refers to first sample as samplel second sample as sample2 etc Furthermore if the names are defined in txt file they must be given in one row and each in own column as shown in figure 5 Fie Edit Wiew Insert Format Tools Da leh see l ba ll Fe A B ic O 1 Sample Samples Samples Sampled Figure 5 Names of the samples in txt file E Save as button One must give a name for the data struct and select the folder where it will be saved Folder data_structs is meant for this purpose but it is not obliga tory to save data structs there F Create button CGH Plotter creates a data struct When the struct is created a message
24. n be stored to tab delimited text file in which the results can easily be examined The freely available CGH Plotter is really easy to operate with Further it is easy to modify and add functions to CGH Plotter CGH Plotter toolbox is under continuous development and in the future it will include new analysis and illustration functions CGH Plotter has shown to be capable of rapid high throughput analysis of CGH data Moreover the results obtained from CGH Plotter are consis tent with chromosomal CGH and thereby the results given by CGH Plotter are verified by biological knowledge References 1 Astola J Kuosmanen P 1997 Fundamentals of Nonlinear Digital Filtering CRC Press LLC Florida 2 Duda R O Hart P E Stork D G 2001 Pattern Classification John Wiley 4 Sons Inc New York 2nd edition 3 Hyman E Kauraniemi P Hautaniemi S Wolf M Mousses S Rozenblum E Ringn r M Sauter G Monni O Elkahloun A Kallioniemi O P and Kallioniemi A 2002 Impact of DNA amplifi 37 ol cation on gene expression patterns in breast cancer Cancer Research Vol 62 pp 6240 6245 Kay S M 1998 Fundamentals of Statistical Signal Processing Vol ume IT Detection Theory Prentice Hall New Jersey Gray J W Collins C 2000 Genome changes and gene expression in human solid tumors Carcinogenesis Vol 21 pp 443 452 Monni O Barlund M Mousses S Kononen J Sauter G Heis
25. o use CGH Plotter 3 How to store and analyze the results 4 What are the assumptions behind the analysis We also provide several examples on the use of CGH Plotter 2 Installation CGH Plotter requires Matlab 6 1 or higher in order to operate Accordingly all data must be in Matlab mat format or in tab delimited text txt format 2 1 Installation Instructions Archive CGH Plotter zip consists of five folders CGH Plotter gui am pli math data_structs and ampli_data e Main folder CGH Plotter contains the following folders and files gut ampli_math CGH_ Plotter m and CGH_Plotter fig e Folder gui Graphical User Interface includes functions and corre sponding figures create_struct m create_struct fig amplikoni m amplikoni fig bp filter m bp_filter fig bp_convert m bp_convert fig plot_data m plot_data fig write_txt m write_txt fig end_all m end_all fig e Folder ampli math includes all mathematical functions used in CGH Plotter bp_median m combined m compute_kmean m cumulative m define_amplicons m dynamic_prog m filter_data m find regions handle_NaNs m kmean m transform_data m writeresults m e Folder data_structs can be located arbitrary It is meant for storage of data structs of CGH data and is initially empty e Folder ampli_data is intended for
26. or samples are illustrated with points filtered data and amplicon boundaries with lines Combined amplicon boundaries are seen as thick black line Figure 14 If one selects each plot to own figure CGH Plotter will illustrate every sample individually Figures 15 and 17 CGH Plotter plots CGH data with blue line and amplicon boundaries with red If Filtered data is selected CGH Plotter will plot filtered data of the sample with green line and if Combined amplicon boundaries is selected CGH Plotter will plot combined boundaries with black line K Index to a gene One may select whether he wants to see cumulative base pairs in the x axis instead of genomic indices L Baseline One may select whether he wants CGH Plotter to use median of each chro mosome as baseline of the chromosome By default baseline is value 1 M One may select to define adjoining amplicons or deletions as one am plicon deletion in the resulted boundary file N Save Boundaries button 22 This button allows one to specify a name for the boundary file and select the folder where he wants to save it CGH Plotter creates a tabular separated ASCII file as illustrated in figure 19 If the name is not specified the results are not saved By default CGH Plotter will save the amplicons with height over 1 2 and deletions with height smaller than 0 95 If needed it is really straightforward to change these limits in the beginning of the function
27. ramming One may choose the properties of the created data set to be illustrated It is possible to plot the CGH data as ratios or log transformed ratios and to plot amplicon boundaries from an individual sam ple or combined amplicon boundaries from a group of samples One may plot the CGH data filtered data or amplicon boundaries ei ther from one chromosome or across all chromosomes It is also possible to plot results from several samples at the same time Thus one may choose whether the results are illustrated in one figure or in multiple figures By default CGH Plotter uses genomic indices to plot the data but one may also select to use cumulative basepairs A Choose data button By pushing choose data button one can select the data to be illustrated Result file has to be constructed in the Find Amplicons phase and consist of seven fields e data CGH data e datafilt Filtered data 19 dp Amplicon boundaries computed with dynamic programming e tu Indices to the changes of amplicon boundaries e chromo Indices to chromosome starts e basepair Cumulative base pairs e samples Names of the samples Only one data set can be illustrated at a time but it is possible to observe several properties of the data simultaneously B Selected data text box The name of the selected data file is seen in textbox Selected data F H G Plot Data y al ES B A C
28. reates a struct that is used in the phase Find Amplicons Fur ther the user defines the type of the filter and size of the window which are used in filtering phase CGH Plotter clusters filtered data into three clusters with k means clustering algorithm Clustered data are delivered to the func tion that computes the maximum number of the change points The number of changes is needed when dynamic programming algorithm computes the amplicons and deletions In Plot Data and Write TXT phase the user may plot the results of the analysis and save the results in ASCII file New basepair Known gap 33 4 2 k means Clustering k means clustering algorithm is used for finding the number of amplicons deletions for each chromosome The idea behind the k means clustering is to cluster the data to k clusters k is assumed known Here the number of the clusters is three denoting amplified genes deleted genes and baseline genes In the k mean clustering means u1 u2 u3 are first initialized to be the 5 th biggest the median and the 5 th smallest values respectively Actual k mean clustering proceeds as follows First a ratio from the sample is drawn and nearest mean Hwinner 18 found using Euclidian distance Second winner is updated by moving it closer to the ratio This procedure is repeated until all m ratios are used Pseudo code for the training phase 2 1 begin initialize u1 U2 U3 2 do classify m ratio
29. s to nearest Hi 3 update Hwinner 4 until the last m 5 end return u1 H2 H3 After training phase every ratio is classified to the nearest cluster The clusters are presented as 1 0 and 1 denoting deleted base line and amplified genes The number of the changes is determined as follows CGH Plotter computes Tmar that denotes the mean value of 2 of the highest values in the cluster amplified Emax Mean maxy cluster 1 In a similar fashion Tmin denotes the mean value of 2 of the smallest values in the cluster deleted Emin mean mingy cluster 1 We have chosen 2 of the highest smallest values since the data we used were not very noisy However this parameter can be changed in function Compute_kmean m The distance between Tmar and Tmin 18 computed and multiplied with the constant that the user has determined The number of the changes c is the result of the multiplication rounded downwards c constant mar Tmin 34 The default constant is six T his number was determined empirically by adjusting it so that known amplicons are found from chromosome 17 The result was then validated by comparing the results to other chromosomes containing known amplicons and by chromosomal CGH illustrated in Figure 20 In other data sets there may be a need to change this number If there is known amplicons we suggest similar way to assess the number of the changes as we have done Howev
30. ta that CGH Plotter is about to plot Parts of the data are written in the form Data name Data type Sample Chromosome It is possible to select sev eral parts of the data but the number of genes must be same for every part H Remove button Remove button removes selected data from the data listbox First one has to select the data that is wanted to be removed D Plot One can select the properties to be plotted e If Data is selected CGH Plotter plots original CGH data e If Amplicon boundaries is selected CGH Plotter will plot the ampli con deletion boundaries that are computed by the dynamic program ming algorithm 21 If Combined amplicon boundaries is selected CGH Plotter will plot combined amplicon boundaries from selected samples The method for computing the combined amplicon boundaries can be selected Possible choices are average median maximum and mini mum By default CGH Plotter uses average If Filtered Data is selected CGH Plotter will plot filtered data that are computed by the filtering algorithm The window size and the type of the filter were determined in the phase Find Amplicons J Show results One can select how he wants CGH Plotter to present the data If superimpose all data to one figure is selected CGH Plotter will plot all selected data to the same figure Each sample filtered data of the sample and amplicon boundaries of the sample have the same col
31. tle of the first column tells which chromosome is in question Names of the samples are titles of the other columns File presents the type start end and height of the amplicon or deletion It also gives the maximum ratio value of the amplicon and the minimum ratio value of the deletion 2 Copy Number Alterations in the BT474 Breast Cancer Cell Line Genome CGH Plotter Original data Amplicon deletion boundaries CGH Plotter pope A n a i a a a l om ODER ASA ERA O TS O CDI Ml Ta ae MM it atl 500000000 1000000000 1500000000 2000000000 2500000000 3000000000 Chromosomal CGH Cumulative Genomic Base Pair Location Figure 20 Chromosomal CGH and output of CGH Plotter for breast can cer cell line BT474 CGH Plotter original data is shown on top ampli con deletion boundaries in the middle and chromosomal CGH data on bot tom CGH Plotter can clearly identify amplicons and deletions detected by chromosomal CGH and as expected due to the higher resolution of array CGH also reveals additional aberrations In order to compare the performance of CGH Plotter we have illustrated both the chromosomal CGH and the output of CGH Plotter in Figure 20 28 3 1 7 Write TXT In phase Write TXT user can write analyzed data or simply contents of data struct to text file for further analysis Data will be printed in tab de limited text format Ela Write data in text format A Choose Data B S
32. uld not interact with each other during the filtering process Such locations are for example chromosome borders and other know gaps There might be also other reasons to limit interaction between genes in sense of filtering Therefore it is possible to give information of the regions that should be treated separately during the filtering process For every given region filter window slides over the whole region At the begin and at the end of the region window size is only half of the given window size Since one half of the window is outside of the region Also if given filter window size is very 36 big for example 10 basepairs result from filtering process is constant for each region If any special info on regions is not given to BP filter CGH plotter considers each chromosome as a separate region 5 Summary CGH Plotter is a Matlab toolbox that is aimed to CGH data analysis The main purpose of CGH Plotter is to identify and visualize the amplicon and deletion regions of CGH data With a graphical user interface CGH Plotter is straightforward to use The user has many possibilities to illustrate the CGH data For example the data can be illustrated as ratios or log transformed ratios and plotted against basepairs if available CGH Plotter enables the user to visualize each sample individually or all samples in parallel It is also possible to plot the data of one chromosome or the data of the sample genomic wide The results ca
33. xt format is used file must contain a column vector similar to one described above in tab delimited text format Structure of the region info file is illustrated in Figure 12 D Regions button By pressing Regions button user can select region info that controls inter polation process Interpolation will be executed only in these given regions Format of this region info file is described in the previous section E Chromo limits button By pressing Chromo limits button user can select general chromosome limits in basepair units that will be used during basepair conversion File con taining the region info must be in matlab mat format and it must contain variable called chromo or a variable whose name is equal to filename The 18 variable must be 25x1 vector containing start points for every chromosome and an end point for last chromosome in 25 th element F Interpolate gaps shorter In this field user can specify the longest gap in basepair units that will be interpolated If there are less than two clones in that area interpolation will not be done Default value for this field is 150000 BPs G Start button By pressing start button user can start the conversion process Status of the process will be show in a message box H Main page button The Main Page button takes user back to the main page 3 1 6 Plot Data In Plot Data phase it is possible to compare the data and results from dynamic prog

User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents