Home

ACRES 3 User Guide

image

Contents

1. could not be calculated Both Precission and Sensitivity are necessary to calculate these values Most common page Fast he pied Si Seana aes CES OME ST of one of the classes check the confusion matrix above Try selecting fewer variables in the group s to produce more general rules Zi EXPERT SYSTEM INTERFACE Automa HaC Ca ol expert systems Expert System Interface Input Fact Expert System E Input Fact Expert System Output Browse for an expert system file previously created by ACRES Give a value for each input variable and assert the fact to get a prediction 1 12 Le
2. 0 data 3_menopause ge40 5_inv nodes 3 5 7_deg malig 3 gt assert 1_class no recurrence 0 64 recurrence 0 64 A simple example of a generated rule Certainty Factor Combination If we repeat the above procedure more than one time for different set of variables we can create a rule set that given a new instance can provide more than one conclusions about the output variable According to the model of certainty factors used in MYCIN two certainty factors about the same fact can be combined using suitable formulas depending on the signs of the certainty factors combined For example if we have two rules with the same conclusion and CF1 CF2 respectively the certainty factors associated with them and they are both positive numbers the combined certainty factor CF for conclusion according to MYCIN theory is given by the formula CF CF1 CF2 1 CF1 CF1 CF2 CF1 CF2 3 In the expert system PASS 4 the remark was made that in formula 3 both certainty factors contribute equally to the final result In practice rules are often not equally reliable since their certainty factors are either bound to an expert s judgment or based on data containing noise so they proposed a generalized version of the formula 1 CF wl CF1 w2 CF2 w CF1 CF2 4 where wl w2 and w are numeric weights that should satisfy the following equation wl w2 w l 5 to assure that 0 lt CF lt 1 To use for
3. ACRES 3 User Guide Konstantinos Kovas Department of Computer Engineering and Informatics University of Patras kobas ceid upatras gr Version 3 0 3 12 06 2014 ACRES v3 alpha 2011 04 ZEN Create Expert System gt Load ExpertSystem Artificial intelligence Group ACRES Automatic Creator of Expert Systems is a tool initially developed as way to test and compare different methods of combining Certainty Factors in expert systems In its second version we extended the architecture to apply for the problem of multiclass classification but the overall architecture remained simple focusing on the goal of comparing certainty factor combination methods The third version is our attempt towards a more generalized tool for generating expert systems More specifically an extension of the system made it possible to generate classification rules for additional variables apart from the output variable for which the final user of the expert system cannot provide values This gives the ability to design more complex rule hierarchies which are represented in an easy to interpret tree structure Feature ranking and subset selection techniques help achieve the generation task in a more automatic and efficient way Other enhancements include the ability to produce expert systems that dynamically update the certainty factors in their rules the generation of rules and functions for interaction with the end user and a graphical int
4. erface for the produced expert system a Dataset amp Variables Settings ACRES TAUboI HiCGkee bom ol expen Systems Dataset and Variables Settings Dataset peman peere 5_inv nodes premeno 30 34 0 2 premeno 20 24 02 premeno 20 24 0 2 ge40 15 19 0 2 E 2_age a E 3_menopause 5 Z 4_tumor size E 5_inv nodes E 6_node caps E 7_deg malig oO Dataset Import 1 Dataset Edit 2 Variables 3 Dataset Import Dataset Dataset Name Breasttancer Warables Fie Browse ff Dataset File Browse s Dataset Name Specify a name for the expert system that will be created Variables File A file containing a name for each variable in the dataset 1_class 2_age 3 menopause 4 tumor size 5_inv nodes 6 node caps 7_deg malig 8 breast 9 breast quad 10_irradiat example variables file Dataset File The dataset file containing known instances about a problem comma delimited format no recurrence events 50 59 ge40 15 19 0 2 yes 2 left central yes no recurrence events 50 59 premeno 25 29 0 2 no 1 left left_low no no recurrence events 60 69 ge40 25 29 0 2 no 3 right left_low no recurrence events 50 59 premeno 15 19 0 2 no 2 left left_low no recurrence events 40 49 premeno 40 44 0 2 no 1 left left_low no recurrence events 50 59 ge40 35 39 0 2 no 2 left left_low no recurrence events 50 59 premeno 25 29 0 2 no 2 left right_up no recurre
5. h the expert system will provide predictions Optional You can also specify one or more Intermediate Variables Values for these variables will not be given directly by the end user Rules will be created for predicting them _age Tema IT 3 menopause 7 ERE 4 tumorsize _ norecumence e A parame mecamence SVE MT 6 node caps 7_deg malig The Continue button loads the Expert System Creation frame b Expert System Creation ACRES AULO MAARA Chee tom oleExPpenls Systems Expert System Creation Dataset Variables BreastCance Selected Variable For each intermediate variable specified and then for the output variable gt Select a variable 1 gt Specify a subset of variables for creating prediction rules 2 gt Add the variable as a node to the architecture Tree 3 Eukova 2 Main Loop i y 3 i Create Expert System Optional A second subset can be specified by checking Two Predictions o There are two alternative methods for combining these two predictions about the same conclusion The method used in MYCIN MYCIN and a generalized version using weights WEIGHTED To help the user in choosing a subset three facilities are offered o Feature Ranking 5 Automatically produced when selecting a variable o Subset Selection 4 by clicking Find Subset o Selected Subset Evaluation 6 by clicking Test Output Variables Selected Variable 7_ deg malig 3 meno
6. he certainty factor are saved in a separate file in the form of CLIPS facts Thus for each prediction rule of the expert system there is a corresponding rule that updates the corresponding frequencies The expert system consists of two files The main expert system that remains constant contains all the rules functions and templates The secondary file contains facts that store the frequencies required in the rules for computing the certainty factors which can change during runtime if the end user provides new instances Evaluation Evaluation Metrics for Classification Problems Evaluation of a classification model is usually based on the following metrics accuracy precision sensitivity and specificity which for two classes positive and negative are defined as follows TP TN TP TP C ee prec SCH Z TP FP FN TN TP FP TP FN TN spec TN FP where TP is the number of cases classified correctly as positive FP is the number of cases that were incorrectly classified as positive TN is the number of cases correctly classified as not positive and FN is the number of cases that are incorrectly classified as not positive In case of more than two classes one can view each class as a separate binary classification problem where positive are the cases of that class whereas negative are the cases of all other classes This way one can produce a confusion matrix for each class Unlike the binary classificat
7. in the new version combines the above probability with the a priori probability found from the general frequency of class C in the entire dataset This probability can be easily computed following the formula f CiN PC Using the definition of certainty factors in the expert system MYCIN we can combine these two probabilities to produce the measures of Belief MB C E and Disbelief MD C E MB Ci E 1 if P Ci 1 P Ci E P Ci MB Ci E max 0 Say at otherwise MD Ci E 1 if P Ci 0 P Ci P Ci E ae otherwise MD Ci E max 0 Finally we can estimate the Certainty Factor using these measures of Belief and Disbelief MB Ci E MD Ci E 1 min MB Ci E MD Ci E It is important to point out the underlying characteristic of this method which is that the certainty factor produced is not a measure of our confidence in C but rather a measure of the change of our confidence in C given the evidence E This means that a positive value represents an increase of our confidence whereas a negative value represents a decrease of our confidence Dynamic CFs Another new feature is the ability to generate expert systems that can update the Certainty Facts of their rules when new instances of the problem become available To accomplish this the certainty factors are not hard coded inside the generated rules but are instead dynamically computed at run time The required frequencies for computing t
8. ion problem with this approach a correctly classified case as negative does not necessary mean that the case was classified to the correct class For this reason the value of TN is not credible and therefore cannot be used for estimating evaluation metrics The metrics used are Precision as defined above and Recall corresponds to Sensitivity For a possible class A Precision is the fraction of instances that were classified to class A that actually belong to that class while Recall is the fraction of instances that belong to class A that were correctly classified to that class Since TP EN is the sum of all cases that truly belong to the positive class the Recall metric is also referred to as TP rate Another useful metric is the F measure combining the recall and precision values IP recall precision S measure TP FN TP FP precision recall 2x precision x recall Finally the weighted average of these metrics for all classes can be calculated taking into account the number of occurrences of each class in the dataset These metrics are widely used in classification performance evaluations and corresponding tools like the data mining tool Weka so using them allows the direct comparison with various classification models Evaluation Report in ACRES The dataset is partitioned in two sets training and testing set The expert system is generated using the training set and then it is evaluated Brea
9. mula 4 however the weights w w2 w should be first determined In PASS statistical data about the problem were used as a training data set to determine the weights by hand In ACRES we offer both combination methods when multiple rule sets are specified for the output variable The system produces the necessary weights for the generalized formula automatically utilizing a genetic algorithm to search the space of possible weight combinations for an optimum one CF Models The system offers two alternative methods for estimating Certainty Factors Consider an output variable C associated with n possible classes C n and a dataset N containing INI instances Evidence EF is a certain pattern of values for a set of variables of the dataset and D is the set of instances in the dataset that this pattern occurs We represent the absolute frequency of class C in D as f C D and the absolute frequency of class C in N as f C N P H E Our initial approach used in previous versions relied solely on the probability found from the frequency of a class in D For a class C the certainty factor is estimated using the conditional probability that an instance is classified in class C given that evidence F is true f Ci D P Ci E D Obviously the above value would be between 0 and 1 so we use the following formula to produce a value in the interval 1 1 CF Ci E 2x P Ci E 1 MYCIN CFs An alternative method added
10. nce events 30 39 premeno 0 4 0 2 no 2 right central no recurrence events 50 59 premeno 25 29 0 2 no 2 left right_up no example dataset file W Dataset Edit Optional After importing the variables and dataset files the dataset is imported as a grid The user can manually edit the values in the grid E 1_class 2 age 3_menopause d 4 tumorsize 5 inv nodes a pmo was foz a pe S a E E E mc T M E a E Pir notecumenceew EISS F Euxova 1 Dataset as a grid Additionally the user can perform the following operations gt Delete Variable Specify a variable and the corresponding column will be deleted gt Merge Variables Specify two variables The corresponding cells will be merged The values will be separated with _ gt Merge Classes Specify a variable and two of its classes Then press Merge to merge these classes as a new one with the name specified gt Descretize Variable Choose a variable with real values Specify the number of classes and a discretization method Dateset Edit I Save Changes The Reset button will undo all changes made and reload the dataset you initially imported The Save Changes button will save all modifications as a new dataset file You must manually edit the variables file if necessary Then import both files again to continue with the expert system creation Variables The user must specify an output prediction variable This is the variable for whic
11. pz 4 tumor s 5 inv nod Etkova 3 Facilities When nodes for all intermediate and the output variables have been added the expert system can be created Create ES will create the expert system as a clips file Evaluate will create an expert system using a training set and evaluate it with a testing set 7_deg maliz_1 G S5_inv nodes Rule Generation and CF estimation Given a variable for which we want predictions made and a subset of variables to be used for the prediction we can generate a set of rules from a training set with the following steps 1 Cluster instances in groups so that each group contains instances that have identical values in the variables of the subset 2 From each such group produce one rule that has as conditions the common attribute value pairs of the instances and as conclusion the possible classes of the output variable 3 Associate each possible class i with a certainty factor using the formula CF n N 1 Where n is the number of instances of class 7 in the group and N the number of all instances in it That is a CF for a class is defined as the frequency of the class in the group It is obvious that the certainty factor would be a value between 0 and 1 We can easily convert this value in the interval 1 1 with the formula CF 2 CF 1 2 We give a simple example of a rule created with this method defrule group_1_class_16 declare salience 7
12. stCancer on the data of the testing set The procedure is Predicting Variable 1_class st Classes lt 2 gt repeated for different partitions of the dataset E cross validation and the average values of ae the metrics are presented Intermediate Variables lt 0 gt The produced expert system does not simply Certainty Factors MYCIN Cross Validation 2 classify an instance to the predicted class It Training Test Ratio 3 rovides an uncertainty value for each P J l Rule Hierarchy Tree possible class In order to make the evaluation 13 76 10TM of a produced expert system Evaluation i I Avg ofrules 24 208 instances in Training Set easier we consider that the Avg of covered instances 66 69 mstances in Test Set system classifies an instance to the class for which the Aral MC FP Beta Preen F Sqrt Pred I I LIN rate rate Recall r p Measure p r Accuracy uncertainty factor is the i r highest 43 0 84 0 84 084 0 84 As described above to evaluate an expert system for more than two classes we 35 076 0 76 076 10 76 0 57 0 55 056 0 56 0 evaluate the performance for each class separately For each class i we treat the problem as binary with the first class being i and the second class being a class consisting of all other classes We then form a confusion matrix and compute the metrics for each class i We are mostly interested in
13. the Sensitivity and Precision metrics We also combine these two metrics producing their mean proportional SQRT p r as a more general metric and the F Measure metric that we defined previously For a measure of the general classifying performance of the expert system we use the Predictive Accuracy metric which shows the percentage of instances in the testing set that were correctly classified to the class they belong It is common when the rules Be meer IN mm i cl TP9 FP2 FN2 produced are very specific having c2TP6FP2FN2TN9 c3 TPO FPO FNO TN 19 many conditions that some or even all instances of a class in the testing Confusion Matrix Results PI POR cl TP9 FP2 FNO TN8 set cannot be classified are not IPR EPOEND TNO covered by the rules This can result 2 TPO FP0 FN9 TN 19 in zero value for TP FN and FP Evaluation Avg of rules 57 101 instances in Training Set which results in zero value in the Avg ofcovered instances 19 31 instances in Test Set denominator of some of the above p OY e aaa Bat Pied i 3 rate rate call r Be cy metrics In these Na we cannot Kanu NG bube NI U calculate the metrics for that class 84 091 009078 O88 083 082 and the system informs the user S Em lo eran sian eed iad Ul Bese accordingly ae oss 038 on mo For class c3 Sensitivity denominator TP FN was always 0 so it could not be calculated xx For class c3 Precission denominator TP FP was always 0 so tt

Download Pdf Manuals

image

Related Search

Related Contents

Per utenti di rete senza fili - Migros    Manuel d`utilisation - Amazon Web Services  329KB  バキュームの ~ アシストなしでも 〝 ・ 診療できます。  Acco TFTSaver    CC-Link Embedded I/O Adapter User`s Manual  DD 200 - Hilti  Rapport finale 3378-2001 Republique Tcheque  

Copyright © All rights reserved.
Failed to retrieve file