Home

xSDR - User's Guide - DB

image

Contents

1. Execution stage Picture 4 1 1 2 Successful connection to a RDBMS 14 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimensionality Options Abo test_data Data type System Double System Double System Double System Double System String DERRE Notfinished action I Running action L Finished action Picture 4 1 1 3 Data select from relational database 15 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 1 2 Data input from text file In this paragraph we will describe the text file as database selection procedure When the user selects a text file as input either as a new database or by selecting a stored source without its stored setting the connection with text file screen will appear picture 4 1 2 1 eXtensible Dimensionality Redu Options About Data Input Data Cleaning Dimensionality Reduction Visualization Weka Available data sources E ail R E Sql Oracle MySql Textfile Server Text file configuration P E First line contains columns identifiers Retrieved columns Name 29 4 2010 9 33 py att6 Picture 4 1 2 1 Connection screen to text file 16 xSDR User s Guide Athens University of Economics and Business MSc Compu
2. Visualization Multiple view ior x Attribute 12 Attribute 13 Attribute 14 Attribute 15 Attribute 16 Attribute 17 Attribute 18 Attri De T kain if i d nj x 4 e 7 g gt f is ga Se 2a s ae ne oe Options Selected dimensions Groph size Z Attribute a V Attibute2 T Attribute3 Data size g J Attributes Attibute5 E Atioutes sa ae Classes g Figure 4 4 2 1 Graph Table 48 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Figure 4 4 2 2 Data Point Graph X Attribute16 Y Attribute24 Etxova 4 4 2 3 Comparative Data Point Graph 49 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 4 3 Comparison projection The last available projection is comparison With this projection you can compare contrastively two columns graphs On the left graph you can select any dimension from the initial database On the right graph you can choose any dimension from the reduced database One such visualization is presented in figure 4 4 3 1 See nn o e Initial dataset Final dataset Dimension Attribute10 Dimension col4 Options Options View all Selected dimension col4 E k va 4 4 3 1 Comparison window 4 5 Fifth level Data mining with Weka tool The last level of the applicati
3. or Next If you press the Next button the application proceeds to the next level without any transformation or any change to the source data This way you pass to the dimensionality reduction stage with all source data without any modification This is of course possible only if all data except for the data class is numerical If you have chosen non numerical data the corresponding warning message will appear as follows picture 4 2 2 informing you that you have to modify the data before the dimensionality reduction Options About Information Details The selected data contains one or more dimension that are not of type Records amp Number To continue you must transform them into a valid Number type Dimensions 37 Picture 4 2 2 warning for non numerical data transformation If you press the data processing button the corresponding screen will appear If there is non numerical data for transformation it can be pre viewed on the screen picure 4 2 3 Otherwise the list is empty By pressing on the lower left part the View all control button you can see all database 25 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team dimensions picture 4 2 4 You select those you want to transform and then press the OK button Alternatively you can press the Cancel button
4. e Data type the database source type SQL Server My Sql Oracle Text File e Class the name of the dimensionality the data class consists of Finally on the lower part of the screen you can see the application progress bar on which the five stages of the application appear The current stage appears in orange the stages that have not been completed in grey whereas the stages that have been successfully completed in green 4 1 1 Data input from Relational Database System If the user selects as data input a relational database the connection in database source screen will appear Picture 4 1 1 1 This screen appears to the user if he chooses to create a new database source from a relational database or if he chooses to open a stored database source in its initial form without the stored settings The connection with the database is simple enough The connection ports are stored they can change from the program s general settings menu which we will analyze in unit 4 6 so the user has to input the database provider and the database name he wishes to load He must also type the required credentials user name password If the information typed by the user is correct a confirmation message will appear picture 4 1 1 2 and the connection with the database will be completed If not an error message will appear and the user will be asked to input the data again 12 xSDR User s Guide Athens University of Economics and B
5. component technology file These are to be included in xSDR application to achieve the desired functionality In addition you must include the MCR Installer in our package if this is a new DRToolbox assembly As already mentioned there is the need for a Matlab Compiler Runtime version created from the same system that the dll file is created in order to function properly It goes without saying that all MCR produced by the same system do not vary so if you have already created the MCR earlier then you do not need to include it in the package After setup and configuration of the package contents you can proceed to compile and create the assembly file Then select Debug gt Build from the menu on the Deployment Tool and expect to complete the process The application screen that you are to see should look like the one shown in figure 4 3 1 3 37 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eplo Edit Project Debug Desktop Window Help DRToolBoxWrapper pry Build Package Package 135 KB _ DRToolBoxWrapper ctf S DRToolBoxWrapper dll 9 DRToolBoxWrapper_overview html DRToolBoxWrapperNative dll A readme bt Add files directories Add MCR Figure 4 3 1 3 Package Details mkdir C Users Tassos Documents MATLAB DRToolBox DRToolBoxWrapper distrib Warning Directory already exists mkdir C Users Tasso
6. DRToolbox is a collection of the most well known dimensionality reduction algorithms The user can choose every available algorithm modify its set up and execute it as it is analytically described in 33 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team paragraph 4 3 1 The link for toolbox s website is available and can be found in the guide s annex For this exemplar the latest version of DRToolbox was used up to the day of the paper s completion however taking into consideration the regular update of algorithms we encourage a frequent control of the website For a full description of the integration see paragraph 4 3 1 1 New DRToolbox versions integration Custom algorithms execution is as mentioned before an additional option of the program Along with the program installation process a number of algorithms is also pre installed Their configuration and execution is described thoroughly in paragraph 4 3 2 In addition the import of new algorithms is possible Algorithms should have a certain form and to follow the rules set in paragraph 4 3 2 1 Custom algorithm writing As soon as the user selects and sets up the algorithm he is going to apply the Execution option is activated and the dimensionality reduction can now take place Simultaneously there is the possibility to calculate the stress between the initial and the reduced data sum W
7. PCA k classVariable load inputData txt A inputData Implementation of the algorithm end File Edit Text Go Cell Tools Debug Desktop Window Help sax DSE 4aOr e S B Medh B BH BOM D A stace Base fe ema oO aS hio ela x 5 O 1 lt algorithm gt a 2 lt name gt MetricMap lt name gt 3 lt description gt Implementation of MetricMap algorithm lt description gt 4 lt parameters gt 5 lt parameter gt 6 lt name gt loops lt name gt P lt type gt Number lt type gt E iang e lt value gt 1 lt value gt 3 lt parameter gt i 10 lt parameters gt 11 lt algorithm gt 12 El function A MetricMap D k classVariable loops 13 m g for looper 1 1 loops 15 16 X c extractClassVariable D classVariable 17 18 rows columns size X 297 clear columns 20 2E sp randomSelection 2 k rows an selectedPoints retrieveDataVectors X sp 23 clear sp 24 M defineMatrixM selectedPoints af 26 U L V svd M hres clear V 28 clear M re 29 30 selectedk_1Points selectedPoints 1 k 1 31 32 gt J sign L i k 1 k 33 C abs L 1 k 1 k 34 Q U 1 k 1 k 35 clear U S MetricMap Ini Coli OR Figure 4 3 2 1 1 Custom algorithm through the Matlab editor 44 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research
8. Science DB Net Research Team o If the user has installed a different version of Matlab than R2009b then there are two options The first one is to build a new version of DRtoolbox assembly file and embed it to the tool This process is described in paragraph 4 3 1 1 Note The use of the term new version for the drtoolbox assembly file may not be the most suitable because the version may not be newer than the one provided It just has to be built with Matlab deployment tool of the same version as the one installed on our system This Matlab version can be older than R2009b or even the Drtoolbox itself can be older than the one provided o Incase the user doesn t want to build a new assembly file he can install the provided MCR Thus xSDR tool will be able to use the right Matlab Compiler Runtime and there is no need to build a new assembly or to change the Matlab version installed on the system o Inthe last case that the user s system does not have any version of Matlba then the user can just install the provided Matlab Compiler Runtime and you are ready to run the tool e Last thing you have to do is to add Matlab Compiler s Runtime path at Windows System Path Variable The process depends on the installed operating system and is described in detail below Windows Vista 7 e Click Start e Click Control Panel e Click System and Security e Click System e Click Advanced System Settings may need administrativ
9. Store new data sources With this option activated we allow the application to store the new database sources that we create Application configuration Maximum available storage space MS SQL Server port 1433 Oracle DBMS port 1521 MySQL Server port 3306 Data files directory Resources Datasources xml Temporary directory Resources tmp V Apply memory contro V Store new data sources Etxova 4 6 1 Application general settings If you proceed to the next tab Algorithms amp Weka you will see the available settings regarding the algorithms and the Weka data mining application 53 xSDR User s Guide Matlab file directory The file in which Matlab files are stored Athens University of Economics and Business MSc Computer Science DB Net Research Team r Application configuration Matlab file directory i Resources Matlab Files Directory for central algornthms Resources Matlab Files Central Directory for distributed algorithms Resources Matlab Files Distributed DRToolBox configuration file Resources DRToolBoxAlgorithms xml Results storage directory Resources Reduced Data Weka configuration file Resources WekaModules xml Figure 4 6 2 Special algorithm and Weka settings Directory for central algorithms The file in which the central algorithms we can execute are stored Directory for distributed algorithms The file
10. Team 4 3 3 Distributed Algorithms Execution The execution of distributed algorithms does not differ significantly from the implementation of the key algorithms When you move in the level dimensionality reduction as shown in Figure 4 3 1 select Distributed Algorithm and press Configuration In this way you move to the well known screen of algorithms configuration From there configure the algorithm as described in previous sections and do the execution as described in Section 4 3 2 4 4 Fourth Level Results Visualization For results visualization we have used Microsoft MSCharts library This library has already been chosen in the initial creation of DRC due to its completeness in diagrams and visualization means its excellent cooperation with the other parts of the program the library is exclusively designed for Net Framework 3 5 which was used for the creation of the application and last but not least its rapid creation of graphic graphs with minimum memory consumption In order to look the visual attribution of both the initial database and the database after the dimensionality reduction pass to the next stage The screen that appears on entering this stage is the one in figure 4 4 2 At this point we have to mention that in order for the database visualization to be executed users must have set in a previous execution stage the data class of the dataset lpn The dataset choosen must contain a class variab
11. The necessary links for the application execution can be found in the annex of this guide 5 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 3 Application Installation There are actually two different ways for someone to work with our tool The first one is to install the final version 1 0 0 on his machine This can be done by downloading the appropriate compressed file from the tool s web site and following the instructions provided in the next chapter The second one is to download the tool s source load the solution on his preferred IDE Microsoft Visual Studio is the suggested one since the whole tool is developed by using it which entails that the solution and the project files are ready for it build the solution and then run the tool from the VS runtime environment The advantages of each method are obvious By choosing the first method everyone can start using the tool following some very simple steps and without any programming knowledge needed By choosing the second method that requires at least a basic knowledge of C and the use of Visual Studio the user is able to see in depth the execution of a dimensionality reduction algorithm He can also make changes in the code and convert the tool to meet his own needs Further in our analysis we are going to describe each method in detail 3 1 Prerequisites The first thing that the user has to do is to ensure
12. able to store the new data Figure 4 3 2 4 Storage can be done using Arff files or simple text files Arff format is ASCII text files that were created solely for use by the data mining application Weka More on this type of file can be found at http www cs waikato ac nz ml weka arff html By selecting Save in text format the user is given the option to choose the separator between columns Comma Space Tab or any other user defined separator After saving if the option Store file and open it is selected the file will be opened for reading by default application Weka amp Notepad respectively e eXtensible Dimensionality Reduction Console v 1 0 enm Options About Data Input Data Cleaning Dimensionalty Reduction Visualization Weka Algorith nfi gettin corfiguraliin Execution options central algorithm Central algorithm Matlab COM DRToolBox Distributed algorithi eee Algorithm details Algorthm MDS Initial records 351 Final records Type Central Algorithm Original dimensionality 6 Final dimensionality Execution i Stress ciis Execution Save V Stress calculation Execution time Figure 4 3 2 2 Configured algorithm ready to be executed 41 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Execution options central algorithm Matlab COM DRToolBox Algorithm details Al
13. backup of the application before the conversion you will be given this opportunity through the dialog of Visual Studio After loading is completed you can compile and then build the project Then you can execute the application while watching the internal processes within the tool Debugger of Visual Studio which is outside the scope of this guide 8 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 User s Guide The xSDR application is divided in five levels The first is the data input level the second is the data transformation the third which is essentially the core of our system is the dimensionality reduction the fourth is the results visualization through graphs and schemas whereas the fifth and last level contains the interaction with the Weka data mining tool During the whole application design and realization period user friendliness and simplicity were given priority without affecting the application capabilities We made sure that the user has all configuration options which were set initially in a simple work environment Our goal from the beginning was to create a powerful tool that allows students researchers and everyone else interested in the wider data mining sector to experiment and to create his own dimensionality reduction algorithms in a friendly working environment 4 1 First Layer Data Input The application allows the us
14. that his machine meets the hardware requirements we described in chapter 1 After that he has to confirm that the following software is installed on the system e Net Framework 3 5 In case the user s system doesn t have the specified framework installed he should follow the link specified in chapter 5 1 and download the framework s installation package from Microsoft s web site Then install it and restart the machine to complete installation process e Microsoft MS Charts Library What the user did earlier with the Net framework must repeat for the MS Charts library If the library is not installed on the system follow the links provided in chapter 5 1 and download the installation package Then install it on the system and restart the machine to complete installation e Matlab Matlab Compiler Runtime xSDR tool requires either Matlab Suite or Matlab Compiler Runtime to be installed on the system Let s see each case separetely o If Matlab is installed on the system the first thing the user has to do is to check the version installed This is necessary because the DRToolbox assembly file drtoolbox dll has to be built with the same version installed on the system The provided library is built with Matlab R2009b version So if this is the version the user has already installed on your system you don t have to install anything else 6 xSDR User s Guide Athens University of Economics and Business MSc Computer
15. The second area consists of cards that correspond to the program execution levels The user can scroll through all levels and change every previous setting or option In this case however he will have to continue the program execution by the point that the change took place In the third area the user can create a new database source for xSDR In the frame you can see all available data input methods The user creates a new database source either by double clicking on the correspondent icon in the frame or by choosing the input method and clicking on Create button If he selects a certain database he will pass to the source connection settings screen which will be described in unit 4 1 1 If he selects a text file as input then he arrives at the text file settings screen which is described in unit 4 1 2 In the fourth area you can select a previously stored database source to work with During the program start this list is empty The stored database sources will appear as soon as you press on Fetch button At this point we have to note that if you select afterwards a data input from the new database source creation area this list will disappear In order to restore it you press again the Fetch button For each of the stored databases the following information will be displayed firstly for a better optical visualization you can see the icon that corresponds to the input source In the next field you can see the verbal descr
16. a mining tool Its development is constant and the exported results are considerably useful mainly to research applications Therefore since our tool to a large extent addresses students and mining data researchers we included the integration of the Weka modules execution option in the application s environment e Memory management Although the constantly growing amounts of memory available assure partially the capability of large volume data processing we considered necessary the addition of a control mechanism to the program As a result our application controls the available memory amounts and warns the user in case the memory is not sufficient for loading a certain database 3 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 1 1 Improvements compared to DRC As mentioned before xSDR is the evolution of DRC The graphic environment has remained unchanged so that it will not create problems to older users Simultaneously the following improvements and additions took place e Compatibility problems with newer versions of Matlab DRC could only be executed by using the Compiler of Matlab 2007b edition have been resolved e Application development for correct execution in Windows 7 environment e Connection completion with COM Automation Servers for the execution of custom algorithms e Input and export data modification for users algorithms Nevertheless th
17. ab COM DRToolBox ee Cortiguration _ iomon O OO Algorithms El Available Matlab Files i CAm Picture 4 3 2 1 Custom algorithms configuration On the left part of the form you can see a list of all available algorithms By choosing one of them you should be able to see on the right part of the screen the name of the selected algorithm the sample dimensions after the reduction and finally the available parameters On the lower part of the screen you can locate the New and Delete buttons By pressing on each you can import a new algorithm or delete an existent one 40 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team After the configuration the algorithm is ready to be executed By clicking OK go back to the original screen of dimensionality reduction where you can now see more details of the algorithm to be executed as previously defined by the configuration screen Figure 4 3 2 2 Pressing Execution starts the execution of the algorithm The performance depends on the system running and the size of the dataset and can last a few seconds After the process is completed successfully the new reduced dataset will appear and the application will display the corresponding message of successful algorithm execution Figure 4 3 2 3 After a successful execution of a dimensionality reduction algorithm the user is
18. and go back to the central data transformation screen picture 4 2 1 without having made any data change Xtensible Dimensionality Reduction Console v 1 0 Options About Data Cleaning Dimensionality Reduction Visualization Weka Data sample Not finished action E Running action wi Finished action Picture 4 2 3 Data selected by default for transformation 26 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimensionality Reduction Console v 1 0 i Options About Data Cleaning Dimensionality Reduction Visualization Weka Data sample ul Select the variables that should be processed Type System String System String System Double System Double System Double System Double System Double System Double System Double System Double System Double System Double System Double l HSE a 2082 808 V View all Not finished action BB Running action Finished action Picture 4 2 4 Appearance of all data and selection of the ones to be transformed After having selected the columns you want to transform the transformation window appears picture 4 2 5 showing two major areas The first one is the data list and the se
19. ant to work with the original data set just double click on the list Then a new dataset configuration screen will appear and you can make a new configuration Finally the user is able to remove a stored data set from memory if it is no longer needed This can be done by pressing the delete button on the left of the screen just below the list of stored data sources In this case you will see a dialog box that requires confirmation of the cancellation as shown in Figure 4 1 3 3 21 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 6 A Sql Oracle MySql Textfile Server Stored data sources Created 28 4 2010 10 29 4 2010 12 Using transformations No Data sources ionosphere data Memory requirements MB 0 094 Data types Text File Created on 28 4 2010 11 26 pu Class variable class Next Picture 4 1 3 1 Stored Datasets 22 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Stored data sources Size KB 16 453 Created on 28 4 2010 11 26 pu Class variable class Rl nat Figure 4 1 3 2 Successful loading of stored dataset with the configuration you have made on previous execution of the program Create Type Source MySqlProvider input Created 28 4 2010 10 28 4 2010 11 onosphere314 d
20. as replacement of alphanumeric with numeric data value substitution in numerical data normalization of values in specific stress or simple data mathematical calculations In addition the user has the option to exclude dimensions at will from the dimensionality reduction process e Dimensionality reduction expandable algorithms This is the part of the application that was mostly emphasized The user must have the option to execute every algorithm he wishes to This way the option of importing algorithms by the user is necessary Furthermore there should not be any limitation in the choice of the algorithm so the system will have to be able to execute centralized as well as distributed algorithms Therefore xSDR covers the options of both central or distributed algorithm execution The user can select one of the incorporated algorithms of the program every application algorithm derive from the drToolbox as in the initial version or import his own e Results storing The whole dimensionality reduction procedure would be meaningless if the user were not able to store the data for future use In this case different alternatives are also offered The produced data can be saved as Arff files text files or relational databases e Graphical visualization of results The application permits the deduction of useful results through graphic visualization of the dimensionality reduction results e Weka modules execution option Weka is a very powerful dat
21. ata Text File reated on 29 4 2010 12 04 ny Class variable class Ai Figure 4 1 3 3 Delete stored dataset 23 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 2 Second Level Data Transformation By pressing the Next button one can proceed in the data transformation stage The result screen appears in figure 4 2 1 extensible Dimensionality Reduction Options About Data Input Data Cleaning Dimensionality Reduction Visualization Weka E3 a a o o o lo o o lolol oli Using transformations No Data sources drinput test_data Memory requirements MB 0 016 Data types MySql Created on 28 4 2010 10 56 uu Class variable att6 _ Not finished action fi Running action i Finished action Figure 4 2 1 Data management screen The central frame of the screen shows a data sample extracted from the selected database Right under the main frame you can see the database details The data that appears here is exactly the 24 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team same with the equivalent frame of the data input stage see Pargraph 4 1 fifth screen area Under this frame you can see two buttons with the available options at this stage of application Data processing
22. cation general settings Through the application main screen menu you can change the basic program settings From the main screen of the application you go to the settings menu by pressing the Options button and the Configuration one The screen you see is depicted in figure 4 6 1 In this tab General we can define the main settings of the program In detail the potential settings are the following e Maximum available memory space it is the maximum memory that is going to be used to load the selected database e Maximum available storage space it is the maximum database number we can store in the memory In general there is no limitation to this number however we suggest for performance and system stability reasons that this number does not exceed 20 e MS SQL Server Port The communication port with Microsoft SQL Server the communication ports are set at the default ports of each base If you have not set manually another port leave these settings unchanged 52 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Data files Directory The file in which the xml file that contains the stored database sources is stored Temporary Directory The file in which the temporary files that the application created during its execution are stored Apply memory control With this option we allow the program to control the memory sufficiency for the algorithm execution
23. cond one the available transformations area This list shows the name of the selected column the data type as well as the transformation rule you have set Initially the transformations will be blank for all columns In order for the transformations to be accepted you have to set a rule for all selected variables in the picture you have already selected all the transformations 27 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team RE N rm eXtensible Dimensionality Reduction Console v 1 0 a X Options About Attribute5 1 00 Data type Transformation System String ath gt 1 the gt 2 pat gt 3 ser System String sunny gt 1 cloudy gt 2 windy __ System Double Attribute5 gt Normalization System Double Va 1 OO System Double Val Val x 1 5 Val Val 0 2 Specify new value for selected dimension Data range normalization Lower bound 2 Upper bound 5 Apply the following action to all values of this dimension Value 0 2 Substitute each distinct value of this variable with a new value Not finished action Ez Running action gt Finished action Picture 4 2 5 Data transformation screen There are three kinds of transformations that can be applied to data Normalization picture 4 2 6 mathematical operation picture 4 2 7 and value substituti
24. e XML header of the files has remained unchanged e Integration of renewed algorithms from DRToolbox e Addition of stored database sources deletion option e Addition of stored database sources settings retaining option or new settings introduction e Abolition of Microsoft Access input option e Selection of any dimension as data class e Correction of problematic Weka modules during the execution 4 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 2 System Requirements xSDR application is designed so that it does not have high hardware requirements The only major hardware requirement concerns memory size Available memory should be large enough so as to load every chosen dataset The program manages the available memory and if the size of chosen dataset exceeds the system memory a warning message appears to avoid memory overload As far as software is concerned there are a number of requirements as well as limitations Our application is written in C and its execution necessitates the installation of Microsoft Net Framework 3 5 is required Due to the latter xSDR cannot be executed in any OS environment other than Windows xSDR has been tested in Windows XP Windows Vista amp Windows 7 environment whereas it is not compatible with older Windows versions for which Net Framework is not available Consequently the limitations we are setting software wise are th
25. e privileges e Select Advanced tab e Click Environment Variables e Under System Variables area scroll down until you see Path e Select Path single click on it and press Edit e Ifyou have NOT Matlab installed on your system add C Program Files MATLAB MATLAB Compiler Runtime v711 runtime win32 If you have Matlab installed verify that your version s path is already added You should see something like this C Program Files MATLAB R2009b runtime win32 C Program Files MATLAB R2009b bin depending on the version installed Windows XP e Right Click My Computer e Click Properties e Inthe System Properties window select Advanced tab e Click Environment Variables e Under System Variables area scroll down until you see Path 7 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team e Select Path single click on it and press Edit e Ifyou have NOT Matlab installed on your system add C Program Files MATLAB MATLAB Compiler Runtime v711 runtime win32 If you have Matlab installed verify that your version s path is already added You should see something like this C Program Files MATLAB R2009b runtime win32 C Program Files MATLAB R2009b bin depending on the version installed 3 1 Working with the executable file Execution of xSDR with the use of the exe file is simple The only thing you have to do is to download the provided file
26. eeasne tea dinttenesssdagaaytceseai auvenesenlauceesss 9 4 1 First Layer Data INPUt essecves vesecanarvecvaacaavacezans ER sth aieanansegvcns sanesnaadauvantadianvess 9 4 1 1 Data input from Relational Database System ccccccscssccessceeceeceeseeeseesssesseeseessseeeeeees 12 4 1 2 Datainpur fron testense ennaa ees a E Eaa e Eas Aa TEATAN SAAN 16 4 1 3 Data input from a Stored datasatesssi iconostas nasida 21 4 2 Second Level Data Transformation ccceeescceeessnceeeenaeeeeeceeeseaaeeesesaeeeeeeeeeeseeeaeeeeeaaaeeessaaeeetiaes 24 4 3 Third level Dimensionality Reduction Algorithms Execution cccsssessssseeeeceeceeceesseeseseseneees 32 4 3 1 DR Toolbox Algorithm Execution ccccccccccccessessssseeeceecececcecaesauseeseeseeeeessasaaeaesesseeeseeeeeeeaes 34 4 3 1 1 Incorporation of new versions Of DRTOOIDOX cccceecessssessnceeaaecaeeeeeeeeeeeeeeeeeeeeeeeeeseseeneess 35 4 3 2 CUSTOM algorithms EXECUTION ceccdisecsscaecesscedeasasesVeahvaaddaryestsdansdshedhcseasdasceaahaeuaedsssiaenansdaveneess 39 4 3 2 1 Custom Algorithm writing sisinio eai iaiaaeaia ian aiia aieiaa 43 4 3 3 Distributed Algorithms Execution ssesssssssssnsersrssssssrrssrrrerrrrsrssssnssenrrernssssnsrenrreessssnesenneet 45 4 4 Fourth Level Results Visualization essssessessesssssssssrsssrrsssrnssrtessstesssstssesrtessstnnsstnnnsttenssteentees 45 AAA Select All ODEON cenere a E E E E 47 4 4 2 Data Points Graph
27. er to input data into the system by using either some of the best known database systems Microsoft SQL Server Oracle My Sql or a text file txt dat At the same time the user can store one or some of his databases so that he can work on them later in the future In addition the user can use the same data configurations or create new ones for his stored database source In figure 4 1 1 you can see the application welcome screen There are six areas which we are going to analyze below Use of xSDR at a computer with a minimum resolution of 800 pixels height is highly recommended 9 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimensionality Reduction Console v 1 0 i n Options About Available data sources E R El Sql Oracle MySql Textfile Server Stored data sources Variables Data sources Using transformations No Memory requirements MB 0 Data types Created on Class variable Picture 4 1 1 The application s main screen The first area contains the application option menu Through this menu the user can select the basic program execution parameters The settings that can be adjusted by the user will be presented in detail in unit 4 6 10 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team
28. f Economics and Business MSc Computer Science DB Net Research Team Naturally there is the option to build the assembly without using the deployment tool just by executing the right commands through the matlab command line Below you can see the commands used for our assembly mkdir C mkdir C mec F C mcc W do d C Users Tassos Documents MATLAB DRTool link lib Users 1 Users 1 Users 1 y C Tassos Documents MAl Tassos Documents MAT Tassos Documents MAl LAB DR1 LAB DRI1 LAB DR1 ToolBox DRToolBoxWrapper distrib ToolBox DRToolBoxWrapper src ToolBoxWrapper prj tnet DRToolBox DRToolBoxWrapper DRToolBoxWrapperclass 0 0 private lBox DRToolBoxWrapper src class DRToolBoxWrapperclass D DR drtoolbox compute_mapping m 4 3 2 Custom algorithms execution The execution of algorithms stored in Matlab files is made as follows select a central algorithm through the dimensionality reduction stage screen followed by the selection of the Matlab COM use then press on Configuration button and the corresponding screen appears figure 4 3 2 1 39 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimenstonanty TEA OAERTN Sy Options About Dimensionaity Reduction Visualization Weka Montano Execution options central algorithm S aion Matl
29. free latest version of the compiler through the Mathworks site using the link provided below http www mathworks com products compiler The last link is not an application but it is very important for the project It s the Toolbox for Dimensionality Reduction using Matlab http ict ewi tudelft nl lvandermaaten Matlab Toolbox for Dimensionality Reduction html Through this link you can find any new versions of the toolbox and build new assembly files to add them to the tool as we discussed in an earlier unit 55 xSDR User s Guide
30. from xSDR site and paste it somewhere on our file system Then you just execute xSDR exe and you can start using the tool We advice you to copy the application folder inside Program Files and create a shortcut on the desktop to access the executable file This way you will reduce the possibility of accidentally deleting any useful files By default the tool is configured to look for its core files into the folders specified on the settings tab If the user doesn t want to use the default paths he she has the option to change them and set the desired paths Note If you use the tool under windows vista or windows 7 it is possible that the system will ask for administrative privileges In this case you just have to right click the shortcut or the executable file and select Run as Administrator 3 2 Working with the source code In order to work with source code you will need Microsoft Visual Studio A link to the Microsoft Visual Studio Express version which is available for free on the website of Microsoft is given in the annex of this guide The application development was completed using the 2008 version so it is essential to have this or any later version Since you have installed Visual Studio you double click on the solution file xSDR sln This will load the whole project implementation If the version of Visual Studio is newer than the 2008 it will automatically convert the project In this case we recommend that you create
31. gorithm PCA Initial records 314 Final records 314 Type Central Algorithm Original dimensionality 35 Final dimensionality 5 ae Stress 0 125019 Execution Save Z Stress calculation Execution time 00 00 24 Col 0 56348 1 11245 Figure 4 3 2 3 Successful execution of dimensionality reduction algorithm 42 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team a _ Save results naas J Options Save as ESen Ee ee ESE EEA i Format type Anf Text file i i i 5 MACA Store file and open it OK Aupo Eixova 4 3 2 4 Amodnkeuon v ou ouvodou e ou vwv 4 3 2 1 Custom Algorithm writing As mentioned before xSDR gives users the possibility to import their own dimensionality reduction algorithms Below we will present the pattern according to which an algorithm has to be written to be properly executed by the xSDR application The algorithm must be written in Matlab language On the upper part of the text there must be a XML header which contains information that the program will use for the execution The algorithm should accept as input a two dimensional array which is the data field to perform the dimensionality reduction algorithm the number of dimensions of the final range and the location of the class data in the range The algorithm gives a two dimensional table a
32. in which the available distributed algorithms are stored DRToolbox configuration file The path for the DRToolbox settings archive Results storage directory The file in which the dimensionality reduction results are stored Weka configuration file The file that contains the Weka modules used by the application 54 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 5 Annex 5 1 Links In this unit you can find the links of the necessary tools for the application execution xSDR application is designed around the possibility of execution in any modern machine and without the purchase of any additional being necessary Microsoft Chart Controls for Microsoft NET Framework 3 5 http www microsoft com downloads details aspx Familyld 130F 7986 BF49 4FE5 9CA8 910AE6EA442C amp displaylang en Microsoft NET Framework 3 5 with SP1 http www microsoft com downloads details aspx Familyld 333325FD AE52 4E35 B531 508D977D32A6 amp displaylang en Matlab Compiler According to the Matlab Compiler Licence Agreement we don t have the rights to give direct links for the compiler However we are authorized to distribute along with our tool the version of Matlab Compiler Runtime Installer we built during the development phase So the required version of McCRinstaller is available through the site of xSDR We would also like to notice that every user can ask a
33. iption of the input type and after that the name of the source the columns included in our initial data the total entries number the database size in KB and finally the date and the time the source was created We have to note at this point that the date and the modification time displayed correspond to the date and time you created the database source of xSDR program and not the modification or creation date of the database source Consequently if the user selects one of the stored database sources three options are available to delete the stored source to load it with the configurations and the transformations made during the storage or to load it to its initial form and make the settings and the transformations from the beginning To delete a stored database source the user must just press on the Delete button which is located on the lower part of the stored database source option area To load a database source with its stored settings the user has to press on the Load button having chosen the database source When the user presses the Load button the option to proceed to the next stage of the application Next button is activated In this case when the user proceeds to the level of data clearance the clearance screen is described in unit 4 2 he will realize that the settings and the transformations he had made during the initial database source storage are already selected Of course their processing is po
34. isscsssiidevts ovnccdattedacegsdecetevinevedadicens a a a aa EAEAN E EE SLEA 48 4 4 3 Comparison PCO Ctl ON rssicon 50 4 5 Fifth level Data mining with Weka tOOl ccccccccccscsssseseseecceseesceeeseeseeseeeseessesasaasaeaaeaaaesaesaeeess 50 4 6 Application general Settings sieisen aneii ana i aii aE i iaaii SEa 52 De ANNEX io aea aa E EE E E O E E E A EE E aS 55 EA e E T E E E E E E E E E E teeeesees sy cneeees 55 2 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 1 System Review xSDR is a modern and powerful data mining mechanism The domain of data mining is one of the most popular research domains in information science The constantly growing memory volumes combined with the need for data mining in most modern data applications have led to the idea of creating a single modern tool that will facilitate a fast and effective management of large volume data xSDR is the evolution of DRC Dimensionality Reduction Console application Therefore the system requirements have remained the same Thus the application provides the user with the following options e Data origin option The user can select the origin of his data among the most popular databases Oracle SQL Server My SQL or text files txt dat e Data processing The user has the option to filter and process the data according to his her wishes there is a wide range of possibilities such
35. ith the procedure completion the user will be informed by means of a Message Box on the successful or not application of the algorithm After a successful application the results storage option is activated The storage can be made in arff or simple txt format 4 3 1 DR Toolbox Algorithm Execution The execution algorithm of DR Toolbox is as follows choose from the dimensionality reduction step main screen figure 4 3 1 Central Algorithm select the DRToolBox and then press the button Configuration that shows the available algorithms of DRToolbox as presented in Figure 4 3 1 1 From the left column of the screen you can choose the algorithm that you want to execute and on the right part of the screen the available parameters will be displayed At this point you should refer to the parameter Eigenanalysis Implementation which is related to the way the library performs an algorithm is found only in DRToolbox algorithms and is available for all algorithms This parameter lies in the Matlab option if the selected dataset contains fewer than 10 000 entries in case the selected dataset contains more than 10000 entries click JDQR By pressing the OK the algorithm is parameterized and ready for execution Figure 4 3 2 2 The algorithm execution process and the messages shown are the same for all types of algorithms and will be described in paragraph 4 3 2 Custom Algorithms Execution 34 xSDR User s Guide Athen
36. le for this part of operation Figure 4 4 1 The dataset does not contain data class and its visualization is not possible 45 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Options About Selected dimension fui Maximum 1 00 Mean value 0 667 Minimum 1 00 Median 0 510 Data source Records 314 Dimensions 35 Selected dimension Data source ionosphere314 data Not finished action B Running action Finished action Euxova 4 4 2 Main screen of results visualization On this screen you can see three major areas in the first central area of the screen we find the graph of the initial database At this point the selected dimensions appear through the drop down menu in the second area of the screen Out of this selector you can choose the dimension that you wish to project The details regarding the dimensions appear it the third area of the screen There you can see all the information related to the projected database The information that is shown contains the following 46 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Maximum value of selected dimension Minimum value of selected dimension The number of distinct values of the selected dimension The average value of the selected dimension The typical declination of the selected dimensio
37. n StdDev The number of entries of the selected dimension The number of dimensions of the selected database The name of the selected database Finally as you can see under the dimension selector there are three further viewing options We will analyze these options in the forthcoming paragraphs 4 4 1 Select all option With the select all option you have the column graphs for every dimension on the screen This projection can be seen in figure 4 4 1 Through these graphs you are able to compare the database variation after the dimensionality reduction On the horizontal axis of the graph you can find the values that belong to the selected dimension whereas on the vertical one the appearance frequency of each value The different colors correspond to the class that the points belong to The legend situated in the upper right part of the screen explains which call is depicted by which color Finally each column contains the number of the value appearance Attribute7 Attribute8 Attribute Attribute 13 Attribute 14 Attribute 15 1504 Figure 4 4 1 1 Select All Option 47 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 4 2 Data Points Graph By pressing on the Data Points Graph the data points graphs appear for all dimensions Through this option you can explore the correlation between the two dimension
38. n on all selected values It functions as follows first you select one of the 29 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team common mathematical operations addition subtraction multiplication division and then you select a numerical value Our rule will apply on the action that you set for all values em eXtensible Dimensiona ater sc Options About Data Cleaning Dimensionality Reduction Visualization Weka Data sample Current dataset dimensions Name rere Atiribute2 Attibute5 Data type Sistem Sing System String System Double System Double Specify new value for selected dimension Data range normalization Apply the following action to all values of this dimension Substitute each distinct value of this variable with a new valu a a essere voeo ime Weka Not finished action Pe Running action ed Finished action Picture 4 2 7 Action application to all dimension values The last transformation is the substitution the most common alphanumeric data filtering method This way users can create an one by one correspondence between alphanumerical and numerical values This action is very common in the data mining sector When the user selects a variable from the list and presses on the distinct value substitution a list with all distinct values of the dimen
39. nomics and Business MSc Computer Science DB Net Research Team format selected dimensions and transformations eXtensible Dimensio Options About Details Records amp Dimensions 37 29 4 2010 9 42 up Execution stage Not finished action g Running action gt Finished action Picture 4 2 9 Dataset storage 4 3 Third level Dimensionality Reduction Algorithms Execution This stage is the core of the program since it describes the procedure for which the xSDR tool was created As soon as the user has selected the data on which the process of dimensionality reduction will be applied and proceeds to the data management stage he will see the following application screen 32 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Execution options central algorithm Matlab COM DRToolBox Z Stress calculation Execution time Picture 4 3 1 Central dimensionality reduction screen Initially the user has to choose the algorithm type that is going to be executed The options are central and distributed The configuration and execution are described in paragraph 4 3 3 of the guide If the user selects a central algorithm he will have the following additional options he can execute an algorithm from DRToolbox or a custom algorithm written in Matlab language
40. on picture 4 2 8 The first two can only be applied to numerical data whereas the third one to characters as well The value substitution is the transformation that has to be applied to all the columns that contain alphanumerical data 28 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Normalization is the mapping of the selected valued onto a specifically set width If normalization is chosen for non numerical data the application of the rule will be rejected and an error message will appear eXtensible Dimensionality Reduction Console 0 i li X Options About Data Input Data Cleaning Dimensionality Reduction Visualization Weka Data sample Current dataset dimensions Name Data type Attribute 1 System String Attrbute2 System String Attribute7 System Double AttrbuteS System Double Attribute11 System Double Specify new value for selected dimension Data range nomalization Lower bound 2 Upper bound 5 Apply the following action to all values of this dimension Value Substitute each distinct value of this variable with a new value a a ere enn lm Weka Not finished action BB Running action Finished action Picture 4 2 6 Values normalization The second transformation which can also be applied only on numerical data is the application of a mathematical operatio
41. on is data mining through Weka tool Weka is an open source soft ware that allows data mining through the execution of certain algorithms and processes Clustering classification and association algorithms will be used for xSDR application Proceeding onto the data visualization level you reach the installed Weka modules management screen This screen is depicted in figure 4 5 1 The management window has three areas on the left you can find a list of all the available algorithms divided in categories according the algorithm type on the right upper part of the screen you can see the Weka command that is going to be executed If you have a certain familiarity with the command you can type it directly 50 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Otherwise you can press the button with the three points just next to the Weka command bar This will make appear produce a window with the possible configurations of the selected algorithm In figure 4 5 2 you see the equivalent window for the configuration of the SimpleKMeans algorithm At first the default values appear which the user can change at will Finally on the lower right part of the screen the results from the algorithm execution appear By pressing the detail button you access the results in a new window in text format Options About ensible Dimensionalit en Data Input Data Cleaning Dimensi
42. onality Reduction Visualization Weka yO O O O B Weka Modules E Classifiers 8 Parameters Weka command weka classifiers lazy KStar B20Ma Percentage split 66 Scheme weka classifiers lazy KStar B 20 M a Relation Name ionosphere314 data weka filters unsupervised instance Randomize 842 Num Instances 314 Num Attributes 5 Name Type Nom Int Real Missing Unique Dist Associators 0 Num 0 0 100 Num 0 0 100 Num 0 0 100 Num 0 0 100 Nom 100 0 0 1col1 2 col2 3 col3 4col4 5 class Test mode split 66 train remainder test Evaluation on test split baa a meen m ah Ma eae of tt ee 0 0 312 99 313 0 0 312 99 313 0 0 312 99 313 0 0 312 99 313 0 0 Correctly Classified Instances 207 100 0 0 2 or Execution stage Not finished action E Running action in Finished action Figure 4 5 1 Data mining level with the use of Weka tool 51 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team Weka configuration a Parameter selection Selected operation weka clusterers SimpleKMeans displayStdDevs false z dontReplaceMissingValues false X 500 maxlterations numClusters 2 preservelnstancesOrder revision seed squared Eror Euxova 4 5 2 Configuration of SimpleKMeans algorithm 4 6 Appli
43. ored dataset As it has already been mentioned the xSDR suite offers the user the opportunity to store a dataset Thus you can configure a dataset as you wish and then work directly with it without re configuring it After a dataset has been stored as described in section 4 2 you can select it as you start the tool The main screen of the application is shown in Figure 4 1 1 As you can see in the middle of the screen there is the Stored Data Sources grid At this list the user is able to view all stored datasets On launching the tool this list is empty even if there are some stored datasets from previous executions What you have to do in order to access the stored datasets is to press the button Fetch By pressing this button all stored datasets become visible For ease of selection you can also have a look at some extra details of the whole dataset such as the source type relational database text file the name of all the variables contained in the original set the records the size and the date saved To work with the data set you have selected from the list you have to pick one of the following scenarios first one is to use the selection and configuration data transformations carried out while storing or you can load the original data set and make a new configuration In the first case you just have to press the button Load bottom right of the list of saved sets and all configurations areloaded If you w
44. ose of Net Framework with SP1 and of MS Charts Control items necessary to the execution System requirements e Supported operational systems Windows Vista Windows XP SP 3 Windows 7 e NET Framework NET Framework 3 5 SP1 e Processor 400 MHz Pentium or equivalent minimum requirement 1GHz Pentium or equivalent recommended e RAM 96 MB minimum 256 MB recommended e Hard Disk 500 MB of free space e CD DVD Driver not required e Display 800 x 600 256 colors minimum 1024 x 768 high color 32 bit recommended The second limitation concerns the execution of dimensionality reduction algorithms All dimensionality reduction algorithms both those included in the program and those that are imported by the user are written in Matlab language Due to the latter the execution of these algorithms presupposes the installation of Matlab compiler Thus the user has to install the whole Matlab suite or MCR Matlab Compiler Runtime which is available for free installation by Mathworks website the Matlab developing company At this point we have to underline that there are some limitations regarding Matlab and MCR edition that are compatible with the application These limitations are clearly mentioned in chapter 3 where the installation procedure is described Finally for the visualization of the dimensionality reduction graphic results Microsoft Chart Control library is required which is offered for free by Microsoft website
45. pper then we create a new class called 35 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team DRToolboxWrapperclass and add it to the Class compute_mapping m file from folder DRToolBox you have already downloaded This file contains all the information needed regardless of the algorithms that we re going to use Therefore you do not need to add another file to the class 4 cepeyenttet O e e n New Name DRToolboxWrapper prj Location CA Users Tassos Documents MATLAB Target Gj NET Assembly Figure 4 3 1 1 New project on deployment tool for Matlab 36 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team F aE g j pes File Edit Project Debug Desktop Window Help x G DRToolBoxWrapper prj b amp amp Build Package Classes DRToolBoxWrapperclass compute_mapping m Add files Add class Shared Resources and Helper Files Place images data files and GUIs fig files here if referenced by any functions Also place here tions called using eval and its variants ns not on the MATLAB path Private functions Add files directories Figure 4 3 1 2 Project Details Tab Package allowing you to see the contents of the package Figure 4 3 1 3 The necessary files are DII and Ctf file
46. s This option will produce a graph table The lines and the columns of the table are the selected dimensions This way you can see how each dimension is correlated to the other On the left part of the screen and under the graphs you can find a selector Through this menu you can choose the dimensions you wish to project On the right part of the screen there are the depiction settings In particular you are able to choose the size of the graph and the size of each point Of course for every change you have to press the Apply button for the changes to take place figure 4 4 2 1 graph table Then by double clicking on any data graph shown you are able to see a magnified data points graph As shown in figure 4 4 2 2 through this window you have the option to change the dimensions given and choose any other you wish In addition you can do benchmarking The view is displayed by pressing the Visualize button located on the right side of the screen by pressing the button the screen will appear as seen in Figure 4 4 2 3 Through this window you can have a view of two different point graphs On the left graph you have essentially the same with the previous projection The difference at this point is that you can project on the adjacent axis system the dimensions resulted after the application of the dimensionality reduction algorithm This fact renders this projection very useful since it allows a clearer view of the transformation procedure
47. s Documents MATLAB DRToolBox DRToolBoxWrapper src Warning Directory already exists mcc F C Users Tassos Documents MATLAB DRToolBoxWrapper prj mce W dotnet DRToolBox DRToolBoxWrapper DRToolBoxWrapperclass 0 0 private d C Users Tassos Documents MATLAB DR Compiler version 4 11 R2009b Processing include files 2 item s added Processing directories installed with MCR The file C Users Tassos Documents MATLAB DRToolBox DRToolBoxWrapper srce mccExcludedFiles log contains a list of fun 2 item s added Generating MATLAB path for the compiled application Created 42 path items Begin validation of MEX files Mon Jun 07 02 12 22 2010 Validating C Program Files MATLAB R2009b toolbox drtoolbox techniques dijkstra dll Found M file C Program Files MATLAB R2009b toolbox drtoolbox techniques dijkstra m MEX file C Program Files MATLAB R2009b toolbox drtoolbox techniques dijkstra dll is valid It contains mexFunction Validating C Program Files MATLAB R2009b toolbox drtoolbox techniques mexCCACollectData dl1l No conflicting M file found Validating C Program Files MATLAB R2009b toolbox drtoolbox techniques mexCCACollectData2 dl1l No conflicting M file found End validation of MEX files Mon Jun 07 92 12 22 2010 Parsing file D DR drtoolbox compute_mapping m Figure 4 3 1 4 Successful Assembly building using the Deployment Tool 38 xSDR User s Guide Athens University o
48. s University of Economics and Business MSc Computer Science DB Net Research Team DRToolbox configuration Configuration p Aa opo ahydpr8po01 Details PCA MDS Algorithm MVU H SimplePCA E ProbPCA Dimensions 7 Isomap E Landmarkisomap H Landmark MVU Mvu Matlab use with datasets containing no more than 10000 data instances Eigenanalysis Implementation 0 JDQR use with datasets containing more than 10000 data instances Algorithm parameters Figure 4 3 1 1 DRToolbox algorithm configuration 4 3 1 1 Incorporation of new versions of DRToolbox One of the measures proposed to increase the use of the tool is to attach the program to the new versions of DRToolbox The procedure to be carried out by the user is the following through the deployment tool of Matlab he she must build a dll library file and insert it into the program The necessary steps are described in details below After rebooting open the Matlab command line program There type the command deploytool This command will open the GUI Deployment tool of Matlab Then select New project We recommend that you give your project and your class names as suggested below in order to perform changes only in the Reference File and not in the source code as well Figure 4 3 1 1 shows the initial screen of the Deployment Tool in Matlab We choose to first create a new Net Assembly and call the project DRToolboxWra
49. s output which is the final data set after the dimensionality reduction All the additional parameters of the algorithm must be given within the header Details for the header This header contains information to be used by the application All information entered in the header should be commented on so as to be ignored by the Matlab Compiler and not displayed as error during the execution The format of the header along with the body of the algorithm are shown below The name and description are prerequisites and must in no case be omitted Then the user can set the parameters he wants These parameters must comply with the corresponding form of the program Each parameter must contain three fields name type and finally value Below we present a sample code that will be implemented by each algorithm and then in figure 4 3 2 1 1 some sample codes from the implementation of the algorithm Metric Map as projected through the Editor of Matlab 43 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team lt algorithm gt lt name gt AlgoName lt name gt lt description gt Implementation of lt AlgoName gt algorithm lt description gt lt parameters gt lt parameter gt lt name gt ParameterName lt name gt lt type gt ParameterType lt type gt lt value gt DefaultValue lt value gt lt parameter gt lt parameters gt lt algorithm gt function A J
50. ser selects a dimension as data class then it is included automatically in the data set The rest of the non numerical dimensions must be selected manually These functions can be seen in pictures 4 1 2 3 and 4 1 2 4 After having finished with data selection you are ready to press the OK button and to close this dialogue window Of course there is the possibility to close the dialogue window by cancelling the procedure without loading the data on memory by pressing the Cancel button 17 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team ll NeMtensible Dimensionality Reduction Console v 1 0 i Data Input Available data sources A El Sql Oracle MySql Textfile Server Text file configuration i C Users Tassos Documents Computer Science Diplomatiki Data s mss Separator character Comma Spaces Tab Other First line contains columns identifiers E System String System Double System Double System Double System Double System Double System Double 29 4 2010 9 33 uu SHESESH Picture 4 1 2 2 successful text file read 18 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimensionality Reduction Console v 1 0 Ei Options About Data Input Data Cleaning Dimensionality Reduc
51. sion appears 30 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team In this list users can set a new value for each of the current values The new value can be only numerical If an alphanumerical value is set the application will show an error message we q z z z 7 3 eXtensible Dimensionality Red on Console v 1 0 see pa xs Options About Attributed 1 00 System String System Double System Double System Double System Double Specify new value for selected dimension Data range nomaiization Lower bound Apply the following action to all values of this dimension oa Substitute each distinct value of this variable with a new value Previous value ath i Picture 4 2 8 Values substitution As soon as the process is completed all transformations that are going to apply to the dataset can be viewed on the upper list of the screen Thus by pressing the Delete transformations button users can delete all the transformations by pressing OK they can apply the transformations on the data whereas by pressing Cancel they can cancel the procedure and return to the previous screen After executing the transformations users are given the option to store the total dataset in the current 31 xSDR User s Guide Athens University of Eco
52. ssible To revert a database source to its previous state and make the transformations from the beginning one should double click on the desired database source through the list of stored sources In this case depending on the source input type that you chose you will be led either to 11 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team the connection with database sources settings screen analytically described in unit 4 1 1 or to the data input from text file screen analytically described in unit 4 1 2 As soon as the user has successfully selected the database source the proceed option is enabled The next level is initiated by pressing the Next button The fifth area of the welcome screen contains information about the selected database source In this frame the user can find the following information on the selected input e Entries the number of entries included in the selected database source e Dimensions the numbers of columns included in the selected database source e Transformations this field informs us whether the selected database source contains transformations or not e Required memory it mentions the memory size in MB that will be used by the selected database source e Database source the name of the selected database source if it is a text file it is the name of the text file if it is a database it is the name of the database
53. ter Science DB Net Research Team On the upper part of the screen the user inputs the path for the text file This can be done either by typing the path or by pressing the Browse button which will open a browsing window through which the user can navigate until the desired text file The user can then select a column separator out of one of the following radio buttons The separator can be space bar tab comma or another character which the user will type In addition the user can choose if the first line contains the column names At this point we have to stress that in the present version xSDR supports only files that contain column names on the first line When all these fields are filled the user can press the file reading button If the information imported is correct the available columns of the file imported will appear in the following frame as seen in figure 4 1 2 2 After having successfully completed the file reading you will be able to see in the following frame the available columns dimensions of the selected data file As in the case of the relational database the column name the data type and the dimension selection button are displayed In addition on the lower left part there is again the data class drop down menu You can select the data that is going to be used by the program in the following stages It should be noted here that all dimensions that are not numerical data are deselected by default When the u
54. tion Visualization Weka Available data sources C Users Tassos Documents Computer Science Diplomatiki Data s Separator character Comma Space s Tab Other Fist ine contains columns identfiers Retrieved columns System String System Double System Double System Double System Double System Double System Double 29 4 2010 9 33 py SSCS CSCS EB Strat Chass tribute ep S Attnbute 1 a Fo Atinbute2 Attribute37 l I Running action L Finished action Picture 4 1 2 3 data class selection 19 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team eXtensible Dimensionality Reduction Console v 1 C wae Options About Data Input Data Cleaning Dimensionaity Reduction Visualization Weka Available data sources eo e A Sq Oracle Sq Textfile Tsemi O OO OOO C Users Tassos Documents Computer Science Diplomatiki Data s Separator character Comma Spaces Tab Other First line contains columns identifiers Retrieved columns S 29 4 2010 9 33 pu att SEESE Finished action Picture 4 1 2 4 manual selection of non arithmetic data 20 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team 4 1 3 Data input from a st
55. usiness MSc Computer Science Options About DB Net Research Team Stored data sources Usemame Password Tables in database localhost drinput root Details Records 351 Dimensions 6 Picture 4 1 1 1 RDBMS connection form As soon as the connection to the database is completed the user can see in the frame the available database tables Pressing on each one of these tables the available columns are loaded For each column the user can see its name and its data type Finally there is one more column with a select button Through this button the user chooses if this column will be used by the program in the next stages of the application In order to facilitate the user there are the Select All and Unselect all options with which he can select or unselect every database column Finally on the lower left part of 13 xSDR User s Guide Athens University of Economics and Business MSc Computer Science DB Net Research Team the screen there is a drop down menu through which the user selects the column that forms the dataset class Connect to database zx Connection details Provider MySqi Pot 3306 Server localhost i Database drinput Usemame root Stored data sources Tables in database temp test_data i Connection completed succesfully prm Records 0 m Dimensions 0 Table schema Column Name Data type Selected
56. xSDR User s Guide 2010 ATHENS UNIVERSITY OF ECONOMICS AND BUSSINESS xSDR User s Guide eXtensible Suite for Dimensionality Reduction DB Net Research Team 5 5 2010 The present document is a user s manual of the xSDR tool a database dimensionality reduction and data mining suite The guide was compiled in the framework of the master thesis of Anastasios Kapernekas under the supervision of Prof Michalis Varzygiannis Athens University of Economics and Business and Dr Panagis Magdalinos Athens University of Economic and Business Athens University of Economics and Business MSc Computer Science DB Net Research Team Index 1 SYStOMiREVIOWissiesieceaceataads sldendie Cosacattdccnsadereoddadaadl oddewatands ensaucceduialerscadiadwelidavcocasivinsensdiieviarecudunacdens 3 1 1 Improvements compared tO DRC cccecceccecceccaceessseececcecceceecesauaeueeseseseeceeeesesaeneeeeeseeeseeeasaes 4 Ze System Requirements sisean aaae E E daecaeiadens sate oaelenceghabwanteaagbwaacees 5 3 Application Installation sisii a a E ea EEEE 6 3 1 PETC QUISILES catis irni o etea E a a a ESA 6 3 2 Working with the executable file ccccccccccccecceeececeecceseesseesseeaaeaaaaaecaeceaaeceeceeeeeeeeeeeeeeeeesens 8 3 3 Working with the Source codes sicsce ciicccanccie sus casedies cscsiad ctetevesssaseveescutseidicdeuas a EREA EEEE 8 As USER S GUI cscs ccvcatecanicesde cvs aeteenesnegebedautunersdevadapedessanesupeness obdtuneg

Download Pdf Manuals

image

Related Search

Related Contents

  2007 055 CdF bordereau  Operating Manual  Aquaride Service - Hitekfloorsupply.com  Fujifilm 16315990 User's Manual  AVeL Link Player  Manuel d`utilisateur Saisir le responsable des arbitres  User Manual  Elmasonic P Ultrasonic Cleaning Units - Operating Instructions    

Copyright © All rights reserved.
Failed to retrieve file