Home

Infomat Manual

image

Contents

1. enter If 15 your new value is allowed the value of the property changes if it is not it does not Some property values are completely open you can type in anything you want If you type something inappropriate here like for instance 5 clusters in a clustering algorithm and apply the function Infomat throws java Exceptions 2 10 3 Lists Lists appear in several functions They can display IObjects and IMatrix Cells and simple java Strings Lists have many functions Depending on the context not all of them are available It is for instance not possible to load and save IMatrixCells The list gui consists of two parts the list of objects and above that a few functions The objects are presented in a textual way often accompanied by a value indicating their order If the name of the object appears like a button you may open the object in a simple viewer by clicking it The objects also have a checkbox that you may tick The function part of the gui has at most three rows from the top File The file row lets you load and save the list of objects The loading is usually restricted by a IObjectGroup meaning that you will only actually load those that is in that particular group For the Group Edit window it is the corresponding IObjectGroup In most other cases it is the IObjectSet of the IMatrix corresponding to the rows or columns depending on the context Sel The selection row allows you to handle selected objects The Sel butt
2. lt properties path gt The difference is that the properties path now may contain a hierarchy of directories that defines several clusterings At the leaf directories all prop erties along the path from the root directory must define a clustering If it does not the Experimentator will abort and report this The results for each of the clusterings are written in a similar hierarchy with the result path as the root For each of these directories a subdirectory properties is constructed In it all properties for the corresponding prop erties path directory is saved so it is easy to match results with properties In the root directory for the properties there has to be a Experiment ator Properties xml file which sets up the basics for all the clusterings It has the same Property s as the Clusterer Properties xml file see Section 3 1 3 plus one Number of Repetitions Number of times the clustering is done for each set of properties So in each leaf directory of the result path hierarchy there will be Number of Repetitions number of results for clusterings as specified in the corres ponding leaf directories of the properties path The Experimentator works most efficiently if the preprocessing is defined through Properties files in the topmost directories as it only keeps one matrix in memory at the time There are two toy examples 2 Actually two the original from the Experimentator_ Properties xml file and the cur rent pre
3. of the rows in order of their value for a evaluation measure Choose the reference grouping choose ascending or descending and hit Sort by value Description This is used to construct a description of a clustering of one dimension in the other dimension for text clustering groups of words that consitute descriptions for the text clusters Choose description method currently only the centroid description choose wether to use reduction and relative reduction The two later are described below Here they are used to decide how many objects should be in the de scription groups and if choosen to reduce the described clustering accordingly Reduction This algorithm allows for reduction of all groups in a grouping There are two types reduction and relative reduction choose by click ing the button at the top The reduction removes objects from each group that are last in the list and or not similar enough to the centroid for the entire group The relative reduction considers the groups of a grouping of the other dimension as the centroids for the groups in the grouping about to be reduced For it to work the grouping along the other dimension has to have the same number of groups 2 4 6 Help Try them 2 5 Pixel View Through the Pixel View you get textual information about the matrix as ordered by the current groupings This is a rather complex tool and is de scribed in some detail here When you open it the firs
4. 3606 Picture Matrix ce Row 73 Row 1217 53638 Limneoarroa Column 68 Column 4524 coach Figure 2 1 The Infomat interface The overview window above left shows a set of texts rows clustered into five clusters The words columns are clustered into five relative clusters one word cluster per text cluster A rectangle indicates the part which is displayed in the main window to the right The matrix elements represented by the pixel the mouse pointer is pointing to are listed in the pixel window below left The non zero elements are presented with their weight w and the row r and column c objects In this example texts have numbers as names The bottom part of the main window shows which pixel the mouse pointer is at and which is the first text and word that it represent in the current ordering both the order number and the string If the matrix is bigger than the number of pixels in the main view each SparsePixel represents several matrix elements The opacity of the pixels are proportional to the weight of the matrix elements they represent What is said in this Section is probably the most important thing to know before you start using the tool Almost as important is what is described in Section 2 11 Before you read that you might benefit from learning a bit more about the interface especially about how Groupings are handled as described in Section 2 8 2 3 Main View and the Overview W
5. ETAS a Infomat as a Processing Tool 3 1 Command Prompt Usage as a migra a Ba gle Gee AS 3 1 1 ExampleClusterer ee a Set A Be 3 1 2 What Properties are Available IS Clusters alae hn AM Swat a Seth cal ad 3 2 3 1 4 Experimentator e sap A A LEG 21 Program Structure Gal LL RR 24 32d JPr perties gua a ib te dm dd Sen ae Ghar Se 24 3 2 2 MEASURE E 25 Chapter 1 Introduction Infomat is both a processing tool and a visualization tool This manual will when it is complete deal with both The visualization will be covered in Chapter 2 and the non visual processing will be covered in Chapter 3 Please note that this is work in progress both program and manual This manual is definitely not complete and may in part be out of date as I develop the program all the time Still I hope it will come to some use Further information can be found in the readme txt file the javadoc of the program which can be found in the doc subdirectory and on the Infomat website http www csc kth se tcs projects infomat infomat 1 1 Infomat Basics Infomat deals with objects that are called Objects Each Object has a string and an id number that uniquely identifies it It also has when applicable a reference to a location where the actual object is stored like a actual text file In this manual they often will be called objects for short Several Objects can be stored in an JObjectGroup and several IObject Groups constitu
6. Infomat Manual for version 100305 Magnus Rosell March 5 2010 Contents 1 3 Introduction 11 Infomat Basics empurra Ens a a A a L2 ABUSA et gt pad a ese ol ar Infomat as a Visualization and Exploration Tool 2 1 Interface overview a e tel a a do 6 ne pg we ER 2 2 Matrix Visualization 0 0 0 0 0 0 0 000000088 2 3 Main View and the Overview 00484 2 4 Menu and Toolbar 0 0 0 0 0 0 0 0 00048 DA RO a adag e a DD Sy che Ge th Ge 2 4 2 Tmage menu and Toolbar us pues pra gas ZAS MIEWS e art a ul pna Mis es Be See A Be bl DAA LOS Ema Pl ta A ZAG Algorithms eei a dove a e da eo ZO CHC Pa ii AA A AAA A 225 Pixel Views asa me o o Bee O E e al 2 5 1 The View button and the Current Lists 252s electo e e ta da da dido o Sok eo Seok ere er SA ST ads 20 4 MOP da aro de aoe ieee eae at A Ml ineo oe bhi Be cin cal ey Rte ae VR gs a o ee ee lal AO 2T HOPG 2 Beek hee neh eee A eens 4 beens 2 wht beh 2 8 Groupings and Groups la ed ii ol 2 8 1 Grouping Panel ira Eb Es qu he 24 2 8 2 Grouping Edit Window 2 8 3 Group Edit Window 2 9 Clustering Algorithms a o ii E E eS 2 10 Standard Components usas pa abas Ae bE E Ba dE 8 2 TOD BUS u eg a sete sarado do er Sesi ee eee eS Ra 210 2 Eroperhes a apso so ON a Ae alee eg Bi ee SW E VOGAIS SAA da E LAS GP ie hs eet A Ge es te He a a A E 2 11 The Matrix Grouping Concepts ss pascal ars a ER 2 12 O te R
7. When set to All in area you can select several objects by a click drag release procedure The objects stored in the selected lists are the recently selected They stay there when you move the mouse 2 5 3 Gathering The Gather button in the main panel opens and closes two panels the Copy selected and Gathered panels The gathered panel allow you to store the addition of several sets of selected objects as described in the previous Section 2 5 2 The All Elements Rows and Columns buttons in the Copy selected panel adds the corresponding list of the selected objects The How button lets you choose between three things When it is set to Accumulate if an object is already in the gathered list its value is increased with the value it has in the selected list When it is set to Add the latest value for the object is stored and when it is set to Set all previous objects are removed and the selected ones are added 2 5 4 More The More button in the main panel opens and closes three panels the Remove Gathered Sort Gathered to Selected and Select Gathered panels The Remove Gathered panels allow you to remove the gathered objects from the matrix The elements are removed from the matrix it thus affects all groupings while the rows and columns are only removed from the current shown groupings To remove these from the matrix and all groupings you need to use the purge matrix option in the Tools menu see Section 2 4 4 The Sort Gathe
8. Whether rows or columns are regarded Takes one of two values true and false Overrides the similarity measures dimension Matrix Source A group of Property s that indicate how the IMatrix should be loaded One of the three has to have a value The other two has to have value null The three possibilities Matrix File A string specifying an IMatrix file Token File A string specifying a Token file Inpath A string specifying a path from where files are read recursively Like for instance replacing the K Means algorithm with the Bisecting K Means al gorithm 20 Comparison Grouping A group of Property s that indicate another group ing If there is any the results will be compared to it in the Evaluation Anyone of the two could have a value but only one The two possibil ities IObjectGrouping File A string specifying a file with a grouping that works with the IMatrix LocationGrouper Construct a grouping by looking at the location of the objects true or false 3 1 4 Experimentator With the Experimentator class you can perform rather complicated exper iments The principles for running the experiments are the same as for the Clusterer see Section 3 1 4 1 The results may however become very hard to overview Section 3 1 4 2 explains how to generate convenient tables based on the results 3 1 4 1 Running an Experiment The Experimentator class is very similar to the Clusterer Infomat gt java infomat Experimentator
9. ara meter combinations are 1 resultPath lt res path gt averagePath lt avg path gt 2 structure lt table file xml gt latex lt res file tex gt 3 structure lt table file xml gt matlab lt res file m gt 4 structure lt table file xml gt csv lt res file csv gt We will deal with these in order here The first generates a similar hierarch ical directory structure as the one in the result path lt res path gt in the average path lt avg path gt It will produce an average result file called average xml for every directory that has result files It will also copy any properties files You can browse the average path hierarchy to compare av erage results Try this for the results for example one Infomat gt java cp classes mro util experimentation ExperimentResultGenerator resultPath examples experimentator 1 results averagePath examples experimentator 1 avg Do not choose the average path to be the same as the result path If you do that twice the averages from the first run will be included in the new averages 22 The three other possible parameter combinations have a lot in common They all read a structure file in xml format that holds information on which directories to process There are examples on such structure files in both tables directories in the Experimentator examples There are lots of pos sibilities in the structure files By looking at the DTD in the beginning of the file you w
10. element in the column The objects are ordered in the column group according to the order in which they appear Ties are broken by their id number Random Clustering Just what it sounds like Location Grouper constructs a grouping based on the location of the ob jects in the file system if this information is available The clustering algorithms are applied to the whole matrix not just the part that is displayed at the moment 2 10 Standard Components The GUI makes use of some standard components that appear in several places This section describes some of their functions in more detail 2 10 1 Buttons Most buttons have direct effect There however are several alternating but tons that only sets the contexts for actions The typical example the Choose rows or columns button in the Clustering Algorithms window It alternates between the words Rows and Columns when you press it the visible being what you have chosen Most alternating buttons has a leading text ending with a colon and it should be rather obvious from the context 2 10 2 Properties A lot of functions could be applied in several different versions Instead of presenting all of them separately they have properties that you can alter These properties can be saved in an xml format and recalled There is also a default setting The properties gui is easy to understand Each property has a value that is displayed You can alter it by typing in a new value and hitting
11. emember to weight the matrix again using the Weight Matrix function in the algorithm menu 2 12 Example In the directory Infomat example you find a few files to start with Read more in the readme txt file There is also a larger example available on the website 17 Chapter 3 Infomat as a Processing Tool This chapter explains some of the possibilities with Infomat when it is not used with the graphical user interface as a visualization tool It is divided into two sections that discuss the command prompt possibilities and the program structure The later to help programmers to use the Infomat classes for other tasks 3 1 Command Prompt Usage The simplest class to use is the ExampleClusterer It generates a clustering of the example set that is distributed with Infomat How to run it is described in the readme txt file and Section 3 1 1 There are so many parameters to set that I have decided to not let the user set all of them in the command prompt Instead for the other classes described here you have to save Properties files for the different functions of the program Properties files are xml files with values There is a lot of functionality in Infomat that use them 3 1 1 ExampleClusterer How to run the ExampleClusterer is described in the readme txt file It runs the K Means algorithm on the English example You can not alter anything from the command prompt This class does the same thing every time but the result ma
12. ering algorithms and when sorting IObjects according to similarity in different ways Overview The overview window shows the entire matrix and indicates which part is currently visible in the main view Grouping Panel The grouping panel with all its functions is described in its own section Section 2 8 1 Toolbar The toolbar is described in Section 2 4 2 on the Image menu Mouse Pointer Info The mouse pointer information panel at the bottom of the main window gives direct feedback on which row and column the mouse pointer is pointing to The leftmost values gives the screen coordinates for the pointer while the rightmost present objects that correspond to these coordinates in the compressed matrix There is always only one row and one column object presented It the matrix is large it is the first object To get more information all row and column objects you should use the pixel view see Section 2 5 2 4 4 Tools There are several tools Evaluation Choose grouping to evaluate and if you want to make an ex ternal evaluation a reference grouping and press Evaluate The meas ures can be saved and loaded in an xml format Matrix Summary gives some basic matrix information Export to Text exports the currently selected grouping s Search is a search tool similar to search engines It is described in Section 2 6 Stoplist is a rather complex tool that is described in Section 2 7 Purge matrix removes all objects tha
13. g a series of names of MeasureGroup s and a name for a leaf Measure The abbreviations are used further down in the structure file in the lt pathtable gt The lt table gt s of the lt columntitles gt and the lt rowtitles gt allow you to set up how the headers for both columns and rows should look like in the latex file lt caption gt defines a caption that will be used in the latex file Finally the lt pathtable gt is where everything happens here the values for the measures specified in the lt measureabbreviationtable gt are extracted from particular directories The lt commonpath gt allows you to specify a start for the path to the directories The lt pathtable gt consists of rows lt ptr gt and columns lt ptc gt Each column has the following structure lt ptc gt lt m gt ASS lt m gt lt rd gt 3d lt rd gt lt p gt KM lt p gt lt ptc gt where lt m gt should contain a measure abbreviation and lt p gt a path to a dir ectory which will be concatenated to the lt commonpath gt lt rd gt is optional If it is not present the full value of the measure will be extracted This is useful when you like to continue working with the values If it is present it should be a figure followed by a d or be an i The first leads to values rounded to the specified number of decimals the second leads to values rounded to integers 3 2 Program Structure This section deals with the programming issues of Infomat and aims a
14. h The search tool works similarly to a search engine You compose a query and then get the resulting list of objects that correspond to that query It is possible to search for both rows and columns The tool consists of three panels The left one lets you formulate a query the list of rows and or columns at the bottom Enter space separated strings in the text field and press one of the buttons below You could also import any list using the Load button as usual The search strings can be formulated in javas syntax for regular expressions if you start with a left bracket An example p Alnum e p Alnum returns all objects with a string beginning with one letter or number followed by an e and ending with with any number of letters or numbers This would give for instance team see and be The right panel gets filled with the result for each query It is sorted according to similarity using the current similarity measure to the query vector The middle panel is divided into three parts In the top part press the Search button to get the result for the query in the result of the right panel You can choose wether to search from the list or the text field You can formulate a query in text and then generate a list query by pressing the button below the text field The middle section of the middle panel contains some properties for the search So far the only property is how many of the search terms mus
15. hat may help you as well If a Property has one or several PropertyValueDescriptions you have to choose one of those values Some Properties are Strings In order for that to work they have to have a lt str gt tag within the value field for example lt v gt lt str gt a string lt str gt lt v gt All files and paths are treated as strings When you want to set a value to nothing it is accomplished like this lt v gt null lt v gt This goes for Strings as well 3 1 2 2 Some of the Properties Here is a short list of some of the Properties files that are generated by the WriteProperties class Dot Product Similarity Properties xml Properties for the dot product similarity Evaluation Properties xml Properties for the evaluation IMatrixCell Filter Properties xml Properties for removal of rows columns and matrix elements IObjectGrouping Text Result Properties xml Properties for export ation of results as browsable pages IObjectGroupinglO Properties xml Properties for exportation of full clustering results Not browsable The files you want if you want to use the result somewhere else KMeans Properties xml Properties for the K Means algorithm 19 Stoplist Properties xml Properties for the use of a stoplist These prop erties are in addition to a file of stop objects words most of the time when such a file is specified If there is no file specified these properties are applied alone The stoplist file is not app
16. he maximal number of objects that should be displayed Type a new number and press enter The gui only present a small part of the entire list for efficiency reasons 2 11 The Matrix Grouping Concept Infomat is quite a complex tool The single most important thing to keep in mind when using it is that the view presents the matrix through a row and a column grouping The matrix may contain several objects that are not visible Some functions work on the visual groupings and some work directly on the matrix behind them This section describes some implications of this fact Each grouping is a view of the matrix Use the purge matrix option in the tools menu to force the matrix to contain only the objects in the current row and column groupings The objects are also removed from all other groupings simultaneously Some functions work on the groupings and some work on the matrix directly When you remove matrix elements you always remove them directly from the matrix Row or column objects on the other hand are removed either from the current grouping or the matrix directly depending on the tool you use A list of functions that remove row and column objects from the groupings and not from the matrix e Through the toolbar and image menu e Through the Pixel View window e Through the Group View window after you have pressed the Apply List Order button When you have removed objects and or matrix elements and purged the matrix r
17. hen you load a matrix see Section 2 4 1 the whole matrix is displayed in the main view and the overview but many operations result in a partial view Which part is shown is decided thorough the Grouping Panel see Section 2 8 1 The Main view further may be zoomed in on any part of the partial view The Overview always displays all of it and indicate by a rectangle what part the main view shows The main view and the overview display a part of the matriz 2 4 Menu and Toolbar This sections contains a short account for the available menu options As the toolbar contains convenient short cuts to some of the options it is described here as well The following subsections describes the content of the menus 2 4 1 File In the file menu you can save and load matrix files It is also possible to load a token file which is a single file containing several texts Look at the example Section 2 12 for the format It is also possible to save the picture in the main view as png file The Infomat Properties are some fundamental settings for the program They are displayed and altered through the Properties GUI which recurs for several settings through the program 2 4 1 1 File Formats The xml formats are quite straight forward You should be able to figure them out by looking at the examples see the readme txt 2 4 2 Image menu and Toolbar The toolbar is divided into two sections with two and five buttons The five first men
18. ill be able to figure them out Section 3 1 4 3 describes the structure file in more detail The second possible parameter combination generates a latex table It reads a structure file lt table file xml gt in xml format and writes a latex file lt res file tex gt corresponding to this structure Try this Infomat gt java cp classes mro util experimentation ExperimentResultGenerator structure examples experimentator 1 tables structure xml latex examples experimentator 1 tables table tex Compile the resulting latex file and you have a document you can look at It is a table with average results and standard deviations You may want to alter some of the typography in the table but is a good start when you want to include results in a text of some sort The last two parameter combinations also reads the structure file but they disregard all row and column titles and generate two matrixes corres ponding to the structure one with the average values and one with the corresponding standard deviations Parameter combination three generates two matlab matrixes in a file that you can call from matlab to start work ing with the values The last combination generates a similar csv file a semi colon separated file that can be read by for instance MS Excel There is one more example to try Infomat gt java cp classes mro util experimentation ExperimentResultGenerator structure examples experimentator 2 tables structure xml
19. ing panel The main view is the fourth section and the last section the mouse pointer information panel displays information of the position of the mouse pointer the picture position and the corresponding first row and column object of the current matrix There are several other windows that appear in certain situations From the beginning the Overview top left is shown The pixel window bot tom left is vital for obtaining textual information for parts of the matrix Here the matrix elements for the pixel the mouse pointer is pointing to are displayed In the following sections the main window sections and several other win dows will be described briefly Section 2 10 describes a few GUI components that appear in several places Finally Section 2 12 describes a small example First however a short account of the visualization 2 2 Matrix Visualization Infomat stores a matrix which is displayed in the main view and the over view This picture is called a SparsePicture and consists of SparsePixels El infomat Overview olx E infomat File Image Views Tools Algorithms Help mA ON Row Groupings Texts Column Groupings Words Shown KMeans Clusterina Shown Weiaht Groubina relative K w E Color None Color None EA Columns 0 14096837 1 montreal 70 10 1686604 pion RONDA 2 f 53638 coach RE 0 08686208 fy Soh Start 3 f 5
20. lied if this properties file is not in the directory The stop file is usually specified in the Clusterer Properties xml or Experimentator_Properties xml file described below TFIDF Weighter Properties xml Properties for the tf idf weighting scheme 3 1 3 Clusterer The Clusterer runs one clustering evaluates it and writes the clustering result and the evaluation result to files Infomat gt java infomat Clusterer lt properties path gt A toy example can be found in Infomat examples clusterer The subdirectory properties contains properties that sets up a clustering Infomat gt java infomat Clusterer examples clusterer properties generates results in Infomat examples clusterer result They con sist of two files like for the ExampleClusterer You may alter the Properties in properties remove some or substitute them for others The Clusterer will inform you if something is missing or if there are to many properties making them ambiguous The properties path has to include several Properties files the most important being the Clusterer_ Properties xml file which sets up the basics for the clustering Here is a description of the Property s in it Result Path A string specifying where the result should be written Stoplist A string specifying a file with stop objects words for instance Opposite dimension to Rows as Matrix Dimension Combined with stoplist Properties from file Rows as Matrix Dimension
21. matlab examples experimentator 2 tables mat m This structure file use different settings for the extraction of values Look at the lt pathtable gt For two values it rounds them to two decimals for one value to two significant figures and for one value it keeps the whole calculated value The last is especially interesting if you will go on working with the values in matlab or some other application 3 1 4 3 The Structure File The previous section showed how to use the Experimentator This section describes the structure xml file that is used by the Experimentator The most complicated part of the structure file is lt measureabbreviationtable gt For this and the following example to work you have to have run the Experimentator examples in Section 3 1 4 1 You would not want to work with rounded values 23 lt mae gt lt abb gt ASS lt abb gt lt n gt Intrinsic Measures lt n gt lt n gt Evaluated Grouping lt n gt lt n gt Weighted Avereage Self Similarity lt n gt lt mae gt lt mae gt lt abb gt NMI lt abb gt lt n gt Extrinsic Measures lt n gt lt n gt Global Extrinsic Measures lt n gt lt n gt NMI lt n gt lt mae gt lt measureabbreviationtable gt This table defines abbreviations ASS and NMI in this example for measures that are used in the actual table Each lt mae gt measure abbreviation consists of an abbreviation and a number of names The names specify a Measure in a Measures xml file by givin
22. ng of the texts rows could be a clustering or a categorization of the texts Any information stored in a matrix may be investigated using Infomat 1 2 Bugs Though I have spent a considerable amount of time developing Infomat there are probably several bugs When using the Infomat GUI it is a good idea to keep an eye on the terminal window Some trace text is printed there Also if any of the internal functions do not work properly for some reason Infomat won t shut down It will only be indicated by the Exceptions that are printed in the terminal For many such Exceptions you will be able to continue working but that particular function did not have the desired effect Chapter 2 Infomat as a Visualization and Exploration Tool In this chapter Infomat as a visualization tool will be described It allows you to display a matrix and group order and alter it You may do this along the rows or columns This chapter describes the GUI in an order that follow the layout It should be considered as a reference The last two sections are a bit different Sections 2 2 and 2 11 describes the most important concepts of the GUI and Section 2 12 describes the ex ample matrix that is bundled with the program There is no undo function Save your work 2 1 Interface overview Figure 2 1 shows the interface The main window rightmost window is divided into five sections At the top is the menu below that the toolbar and under that the group
23. on selects all objects the Desel button deselects all objects the Rm button removes all currently selected objects and the Inf button inverses the selection Order Here you can reorder the objects in the list There are several possible orderings You choose between them in the combo box and applied them on the objects by pressing the Apply button This has no other effect than the ordering in the list For an ordering to have effect on anything else you have to do more On a group for instance you have to hit the Apply List Order button see Section 2 8 3 The two first uses the similarity measure along the rows or columns depending on the context These are only available in some of the lists Sim to Sel Sorts all objects in order of similarity to the selected objects Similarity Sorts the objects in order of similarity to all the objects in the list Literal Sorts the object in literal order Random Makes a random permutation of the objects Invert Inverts the order of the objects 16 Original When the list is displayed for the first time it has a particular order Through this you can revert to it There is one exception When hit the Apply List Order button in the Group Edit Window the new order is set as the original To the right in the function part of the gui there is an indicator that shows how many of the objects in the list that are displayed 50 172 There is also a small text field in which you may alter t
24. processed one 3These and the following examples have line breaks for typographical reasons They should however of course be written as a single line in the command prompt 21 Infomat gt java infomat Experimentator examples experimentator 1 properties which generates results in examples experimentator 1 results and Infomat gt java infomat Experimentator examples experimentator 2 properties which generates results in examples experimentator 2 results Look ing at these should give you an idea of how to use the Experimentator Notice that the second example do not have any I0bjectGrouping Text Result Properties xml files which leads to that no textual result files are generated This is very convenient when you run large experiments with many repetitions as the results otherwise tend to get huge 3 1 4 2 Extracting Results from an Experiment The package mro util experimentation contains classes that helps you ex tract parts of the results generated by the Experimentator This is actually a very nice functionality for any kind of experiment but it will described here in the context of the Experimentator The main objective is to cal culate average values and standard deviations for the results in the result directories You run the Experimentator like this Infomat gt java mro util experimentation ExperimentResultGenerator lt params gt If you give no parameters it displays some information The possible p
25. red to Selected panel allows you to sort the gathered row column objects in order of similarity to the all row column objects of the selected lists RowRow The similarity of each gathered row to all the selected row The row similarity measure see Section 2 4 3 is used to extract the repres entation and calculate the similarity RowCol The similarity of each gathered row to all the selected columns considered as a representation using the row similarity measure ColCol The similarity of each gathered column to all the selected columns using the column similarity measure 11 ColRow The similarity of each gathered column to all the selected rows considered as a representation using the column similarity measure The Select Gathered panel lets you move the gathered objects to the se lected panel There are four straight forward buttons All Elements Rows and Columns sets the selected objects overwriting the previous selected ob jects The El for RC extracts the matrix elements that intersect with the rows and columns of the gathered objects and sets them as selected objects The RC for El does the opposite The last row of the Select Gathered panel lets you extract the represent ation for the objects With the C for R button you set the columns of the selected list to the objects that represents the rows of the gathered list ac cording to the row similarity The C for R button uses the column similarity analogously 2 6 Searc
26. t appear in all objects If you use the text field a term is a space separated character string meaning that a group of terms defined by a regular expression will be considered as one term This is very convenient sometimes The bottom part of the middle panel lets you disply the search result graphically as a grouping in the main and overview Choose wether to con sider rows or columns to display the result or the query if the grouping should consist of one group or one group with the rest in a rest group and whether the result should be displayed directly or as a coloring Press Ap ply 12 2 7 Stoplist The stoplist tool is an implementation of the common notion of a stoplist in information retrieval It can do a little bit more though The stoplist window has four panels The leftmost shows several Proper ties that might be altered The middle panel allows you to load and save a list of ordinary strings from to a simple text file The rightmost third of the stoplist window consists of two panels The top panel is a list of IObject s that can be removed stopped These might be loaded from an xml file and saved as well The button From Strings to IO allows you to convert the strings into IObject s that can be removed from the matrix Only IObject s that exist in the matrix are generated Objects may be converted into a list of strings using the From IO to Strings button In rightmost bottom panel Main you choose which ma
27. t are not displayed in the overview If you for instance have deleted certain uninteresting objects from a grouping this function removes them from the matrix and from all other groupings Purge matrix is applied the moment you chose it from the menu whithot any options window Transpose matrix speaks for itself This is applied directly 2 4 5 Algorithms The options in the algorithm menu are Clustering Algorithms Here you can choose between several clustering algorithms See Section 2 9 Filter Matrix The Filter Matrix algorithm is straight forward alter the Properties and hit the apply button Weight Matrix In the Weighting window you can choose between different weighting schemes and alter their properties When you are satisfied hit the apply button The weightings considered the rows to be the objects and the columns the representation Some things in the properties need explanations tf according to Croft 1983 Ni E tfi 1 A 2 1 J at Na Tj 2 1 idf according to Croft and Harper 1979 Nword i n idf c log 2 2 Nword i where n j is the number of times word appears in document j max ni j is the number of times the most frequent word in text j appears In the properties c is called Local row global column weight importance factor and cz Global column weight belief factor Cluster Sorter The cluster sorter is work in progress It allows you to sort the clusters of the current clustering
28. t help ing the developer use the different functions More detailed information about all classes can be found in the javadoc in the directory Infomat doc The simplest example is the ExampleClusterer How to run it is de scribed in the readme txt file By looking at the code the programmer will also get the first idea of how to use Infomat when writing own programs For now this section is very incomplete As a first help I give a small uml diagram in Figure 3 1 for the most central data structure classes used by Infomat These classes can all be found in the infomat vectorspace package and its subpackages 3 2 1 Properties This Section gives a short introduction to the Properties class 24 interface IVectorindex Vector a t Y IObjectGroup Matrix a y IObjectSet IObjectGrouping Figure 3 1 UML Part of the Infomat data structure Objects of the Properties class contain a lot of Property s that can be grouped into PropertyGroup s to provide more order Groups can contain groups in a hierarchy A short discussion on how to handle xml files representing Properties objects can be found i Section 3 1 2 1 3 2 2 Measures This Section gives a short introduction to the Measures class Objects of the Measures class contains a lot of Measure s that can be grouped into MeasureGroup s to provide more order Groups can contain groups in a hierarchy 25
29. t panel to the actual group using the Apply List Order button List When you open a group edit window the list panel contains all the IObjects in the group You can alter it in many ways using the list manipulations The similarity that is used is the row or column similar ity from the Similarity View see Section 2 4 3 For the manipulations to affect the group you have to press the Apply List Order button You can open group edit windows from the main view right click and choose either the row or column cluster for the current pixel 2 9 Clustering Algorithms There is a Clustering Algorithm Window In it you can decide if you want to cluster rows or columns You choose algorithm in a combo box The algorithms all have some properties that can be altered like for instance the number of clusters The algorithm window explains these properties rather well 14 K Means K Means clustering Bisecting K Means Bisecting K Means clustering Relative Clusterer An algorithm that cluster the columns or rows rel ative the rows columns The column objects that have the highest weight in the first row cluster is assembled into a first column cluster and so on Appearance Relative Clusterer An algorithm that cluster the columns rows relative to the rows columns A column group is created for each row group Each column object is put in the column group cor responing to the row group in which it first appears the first non zero matrix
30. t time it has just two panels From the top the main and current panels When the mouse pointer points to a partic ular pixel the current panel displays all the matrix elements that the pixel represents 2 5 1 The View button and the Current Lists The View button in the main panel lets you choose between elements rows and columns For elements the Elements tab in the current panel displays all the matrix elements that are represented by the pixel the mouse pointer points to They are presented as pairs like row object column object followed by a value the weight of the matrix element If you choose the Rows Columns tab the row column objects for the matrix elements are presented with the weight of the corresponding elements When the View button is set to Rows Columns the Elements tab does not show anything The Rows Columns tab shows the row column objects associated with the picture row column The selection next section is affected in the same way by the View button 10 2 5 2 Selection The Select button in the main panel opens and closes two panels the select and selected panels These panels allow you to study some objects more thoroughly To select anything pixel selection has to be on See Section 2 4 2 The Select button in the select panel lets you choose between Single and All in area If you click the mouse on a pixel when set to Single the objects in the current lists are stored in the selected lists
31. tes an ObjectGrouping Through this manual these are also called groups and groupings for short Right now each IObject can belong to only one IObjectGroup in every IObjectGrouping The main data structure in Infomat is a matrix called an Matriz It is an implementation of a sparse matrix The objects along the axes of the matrix rows and columns corresponds to IObjects Each axes has a special IObjectGroup called an ObjectSet An IObjectGrouping can only contain IObjects from one IObjectSet The IMatrix stores several MatrixCells which holds information of the relation between two IObjects one from each IObjectSet The basic inform ation is a count and a derived information is called a weight HM am slowly developing a dense matrix structure which might be useful sometimes However for now all matrixes are handled as sparse As the intended use of Infomat is Information Retrieval it is not a problem For the GUI to be really useful the objects along the axes of the matrix has to be interpretable When they are the matrix is usually sparse and or small For a typical Information Retrieval scenario the row IObjects may consti tute texts with titles and locations in the file system and the columns words that appear in the texts For each word that appear in a particular text an IMatrixCell with the number of appearances is stored as the count The weight of the IMatrixCell can be calculated through a weighting scheme An IObjectGroupi
32. trix dimension that is considered The Apply button removes the IObject s currently in the IObject list from the matrix and all groupings Using the From IO to Strings button you can save any list of objects in a simple text format 2 8 Groupings and Groups Infomat stores a matrix It is displayed in the main view and the overview in order of a row and a column grouping A grouping consists of one or several groups which together contain all or some of the Objects in the matrix This sections describes how the groupings and groups are managed 2 8 1 Grouping Panel Through the grouping panel all handling of the groupings is devised It is divided into two sections one for rows and one for columns They work similarly The topmost drop down menu displays the currently selected grouping When you choose the grouping here the order of the objects along the di mension rows or columns changes The bottom drop down menu selects the coloring grouping For the rows this leads to a coloring of the pixels and for the columns a coloring of the background columns The pixels are averaged over the matrix elements they represents while the column coloring is averaged over the entire columns When the E button beside each drop down menu is pressed a grouping edit window is displayed It is described in the next Section 2 8 2 2 8 2 Grouping Edit Window The grouping edit window looks a little different depending on which of the four gro
33. u options on the Image menu corresponds to the five icons in the second button section Pixel selection When the mouse is clicked on a pixel in the main view information on it is displayed in the Pixel View window see Section 2 4 3 Drag For moving the selected zoom area Zoom selection By clicking dragging and releasing the mouse within the view that area is zoomed in Delete rows Click drag and release to remove rows Delete columns Click drag and release to remove columns The following two menu options corresponds to the leftmost two icons They toggle the group separators on off The last two options in the Image menu toggles the guide lines that helps with positions on and off and resets the zoom entirely All Image functions work in both the main view and the overview 2 4 3 Views The different Views are the main ways to get information The options in the Views menu are all toggle options activating deactivating the corresponding view Pixel View The pixel view shows information on the pixel the mouse pointer is currently pointing on and very much more It is described in Section 2 5 Selection View This part is not yet fully implemented and does not change anything Similarity View In the similarity window you can change what similarity measure is used for both rows and columns You may also change the Properties of the chosen similarity The chosen similarities are applied whenever appropriate for many clust
34. upings it concerns They all have the following sections 13 Name panel Here the name of the grouping is displayed You can alter it Groups panel Here all the groups are displayed For each group you can alter the name and press the E button which opens up a group edit window It is described in Section 2 8 8 Reordering panel By changing the order of the numbers in the text filed and pressing the Apply button you can change the order of the groups in the grouping If you leave a group out it is deleted a very convenient way to remove one or more groups File panel Here you can load and save groupings For either to work there has to be a matrix loaded For coloring groupings you can change the color of each group in the groups panel The change does not take effect until you press the Apply button in the coloring panel which for coloring groupings is located between the reordering panel and the file panel There you can also reset the coloring to the default colors The opacity of the pixels can be altered in the row show grouping edit window and the opacity of the column coloring in the column coloring grouping settings window By default the column coloring opacity has a lower range than the pixel opacity 2 8 3 Group Edit Window The group edit window have the following sections Info panel Here the name of the grouping is displayed You can alter it Main panel Here you can apply any changes you make in the lis
35. y differ due to K Means random initialization The result is essentially two xml files with names like e clustering 1222852580758 xml e clusteringEvaluation 1222852580758 xml e clusteringResult 1222852580758 xml where the number is the system clock time The first file contains a full clustering result The second file is the index file for a textual presentation of the clustering result You may look at it in a browser and follow the links The third file contains an evaluation a lot of measures It can also be viewed with a browser It represents a Measures object corresponding to 18 such a class Section 3 2 2 describes it in more detail but you will be able to use such files without reading that 3 1 2 What Properties are Available The class WriteProperties helps with setting up default values for many Properties Infomat gt java infomat WriteProperties lt resultPath gt writes one Properties file each for several classes to the specified directory 3 1 2 1 Altering Properties You can open the xml files in an editor and change the values for the different properties The properties are rather self explaining Each of them has a name a value and a description that explains its purpose A Property lt p gt lt n gt Name lt n gt lt v gt Value lt v gt lt d gt Description lt d gt lt p gt Some of the Properties that are more difficult to understand have PropertyValueDescriptions lt pvd gt in the xml t

Download Pdf Manuals

image

Related Search

Related Contents

休館日 1 2 紙芝居会 3 4 休館日 5 6 7 8 9 10  PPG Wave 2.V Manual - Velp-Zuid  Kanex iAdapt DVI  RCA 25210RE1 telephone  HASBRO Battle of Naboo Game 88-003 User's Manual  USB-TC-AI User`s Guide - Measurement Computing  GL400 Series - Spectra Lasers  Guida operativa per la prevenzione degli effetti negativi dei vapori e  Manuale - Coop Bilanciai  LifeSource UA-767 User's Manual  

Copyright © All rights reserved.
Failed to retrieve file