Home

Algorithms from Machine Learning

1. 30 49 893 92 517 03 2806 03 4 90 European Union 25 839 28 482 20 5758 41 30 13 994 25 566 36 3105 66 4 28 European Union 15 952 81 513 61 6318 16 25 80 1065 26 566 93 3388 98 3 49 European Union 12 609 81 400 97 4278 35 42 65 722 94 475 00 2187 93 8 54 European Union 10 578 69 419 93 4513 20 48 02 861 86 585 72 2563 58 7 56 Belgium 1177 16 561 93 8640 45 24 77 1409 88 778 13 4336 85 3 78 ennak 1013 76 352 26 6855 01 53 84 1249 95 497 01 3778 61 3 26 Germany 1059 39 441 45 7527 86 39 25 1314 72 638 80 4251 82 7 29 mucha 889 55 497 57 6973 35 21 66 947 31 675 02 2404 66 3 17 E 1266 88 711 29 8827 19 12 49 868 24 764 83 3095 68 0 52 France 1085 12 438 20 7479 60 29 96 1014 91 456 67 3405 41 6 27 Portugal 746 24 356 13 3898 35 6 25 191 71 144 97 497 99 0 60 Spain 593 53 532 23 3498 86 20 42 433 16 473 81 1072 65 1 29 Greece 794 69 891 07 4133 29 18 20 608 18 776 33 1655 66 0 04 Italy 1053 24 706 05 5754 93 28 54 874 83 542 60 2871 15 1 68 In order to start the clustering classification we click in the table to open its popup men and then select Classification L CAPRI t britz capri gams es Boe View Handling Windows L Supply details cluster view 0 Oe eP aes Years Scenarios saol View type ico EE Es esa 2020 MTR_RD v EA Table Cereals Revenues Cereals Income Cereals Yield Cereals Crop Oilseeds Revenues Oilseeds Income Oilseeds
2. Yield Oilseeds Crop Other arable crops Other arable crops Other arable crops Other arable crops Vege Euro ha or head Euro ha or head kg or 11000 share Animal Euro ha or head Euro ha or head kg or 11000 share Animal Revenues Income Yield Crop share Animal Perm head ha or head density head ha or head density Euro ha or head Euro ha or head kg or 11000 density Rever or 0 01 or 0 01 head ha or head or 0 01 Euro k4 animals ha animals ha animals ha European Union 27 818 70 468 75 5524 26 30 49 893 92 517 03 2806 03 4 90 3274 41 1309 49 22201 53 3 96 European Union 25 839 28 482 20 5758 41 30 13 994 25 566 36 3105 66 4 28 3261 98 1242 50 23464 87 4 03 European Union 15 952 81 513 61 6318 16 25 80 1065 26 566 93 3388 98 3 49 3529 39 1301 68 23704 28 4 25 European Union 12 609 81 400 97 4278 35 42 65 722 94 475 00 2187 93 8 54 2286 71 1232 57 16985 51 3 18 European Union 10 578 69 419 93 4513 20 48 02 861 86 585 72 2563 58 7 56 1688 16 848 92 22102 42 3 10 Belgium 1177 16 561 93 8640 45 24 77 1409 88 778 13 4336 85 3 78 4412 24 1833 10 43975 38 12 23 Denmark 1013 76 352 26 6855 01 53 84 1249 95 497 01 3778 61 3 26 2916 66 757 41 22763 14 6 34 Germany 1059 39 411 45 7527 86 39 25 1314 72 638 80 4251 82 7 29 3260 80 1405 04 44224 29 4 69 Austria 889 55 497 57 6973 35 21 66 947 31 675 02 2404 66 3 17 2177 94 729 63 40199 86 3 16 Reload Hetherlands 1266 88 711 29 8827 19 Copyto Clipboard 888 24 764 83 3095 68 0 52 9119 67 3764
3. and fallow land Income Cross validation Folds 10 0 1035 Set aside and fallow land Crop share Animal density Percentage split fo 96 0 0048 All cattle activities Income 0 0268 All cattle activities Yield 0 005 Beef meat activities Revenues More options 0 0026 Beef meat activities Income gumai M 0 0207 Beef meat activities Yield j 0 0019 Other animals Revenues 0 0031 Other animals Income Start Result list right click For options 9 7609 12 02 01 rules ZeroR Time taken to build model 0 04seconds 12 02 28 functions LinearRegression Evaluation on training set Summary Correlation coefficient 0 807 Mean absolute error 2 4774 Root mean squared error 3 369 Relative absolute error 61 4055 Root relative squared error 59 051 Total Number of Instances 269 Ignored Class Unknown Instances 31 e The choose button will give access to a wide range of different classifiers many of which have additionally options which can be edited by users A multiple linear regression using the Akaide criterion for model selection is used as the default assuming that most people will start with using numerical values as class attributes Please not that switching between nominal and numerical class attributes might trigger error messages if the currently selected classifier cannot handle the newly selected class attribute type lt e Itis recommende
4. colors it defines the class attribute dependent variable of the data to classify For classification algorithms which require nominal values the assigned class from the classification is used 2 A table with the explanatory attributes Both tables must be as conventionally in the exploitation tools the observations in the rows For maps each map carries the data for a region But one might also work with two tables where the observations are not strictly geo referenced entities such as farm types The CAPRI GUI will automatically send new data to the WEKA GUI if either the map or the table using classification colors or the table is updated by a user action The basic data flow is shown in the graphic below Class attribute Preprocess numeric or nominal select attributes remove obs Filter Visualize select attributes Classify Additional attributes Interaction between CAPRI GUI and WEKA Let s construct an example we want to check if the income change in cereals in a simulation depends on the crop shares of cereals and the yields In order to do so we first render our map as usual table Farm details mapping view use the option dialogue to show percentage changes against the baseline 4 CAPRI GUI Version 3 0 August 2010 tru file capram User name Wolfgang Britz User type sdmmistrator md han nat a ae Bhs ba E SA te The regions shown are our instances and the val
5. docs endog yields pdf e Sensitivity analysis for endogenous features with the supply model http www capri model org docs Sensitivity analysis for model features in the CAPRI supplymodel pdf e Decomposition of changes in behavioral functions of the market part http www capri model org docs Decomposing market model results pdf All these approaches built on known structural features of the model The now added Machine Learning package aims to add more data driven approach applicable also with less a priori knowledge Machine learning Wikipedia gives the following definition Machine learning a branch of artificial intelligence is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data such as from sensor data or databases Machine Learning is concerned with the development of algorithms allowing the machine to learn via inductive inference based on observation data that represent incomplete information about statistical phenomenon Classification which is also referred to as pattern recognition is a important 2 task in Machine Learning by which machines learn to automatically recognize complex pattern to distinguish between exemplars based on their different patterns and to make intelligent decisions That is naturally a very general description Machine learning has been widely in a wide range of application fields
6. 04 48370 33 13 47 France 1085 12 438 20 7479 60 Export Data 1014 91 456 67 3405 41 6 27 3865 74 1820 07 42422 31 3 03 Pivoting Portugal 746 24 356 13 3898 35 ETACIE 191 71 144 97 497 99 0 60 53672 36 32871 44 41242 75 0 22 Statistics Spain 593 53 532 23 3498 86 433 16 473 81 1072 65 1 29 2071 26 698 01 4844 85 6 19 Classify Classify numeric Greece 794 69 891 07 4133 29 F 76 33 1655 66 0 04 4215 46 3829 68 2709 12 8 62 Vi i Clasify nominal jew Italy 1053 24 706 05 5754 93 Do not classify 142 60 2871 15 1 68 4297 29 2961 06 11189 96 2 55 Table View gt Ireland 1082 81 676 66 8583 71 wry 1320 34 889 34 3681 85 0 02 7271 07 1208 30 16318 50 0 73 Finland 477 95 393 91 3614 87 47 08 417 17 512 714 1258 13 3 25 2517 05 965 92 23951 47 2 04 Sweden 751 59 251 31 5491 85 29 06 900 79 381 42 2741 29 1 62 2865 73 893 00 30978 91 3 04 United Kingdom 1188 89 616 00 7806 05 17 47 1285 81 755 39 3666 38 3 59 4194 67 2695 73 30938 21 2 92 Czech Republic 730 23 604 25 5049 95 45 72 869 85 663 61 2396 48 13 74 1956 81 726 90 20829 30 5 08 Estonia 462 43 364 06 4001 38 44 74 775 63 579 09 2255 56 10 52 1419 67 706 33 8587 71 0 81 FOL Oe ae Ci TI HHO acan naora cana tan an i 4 m CAPRI GUI ersion 4 0 September 2011 Ini file capri ini User name Wolfgang Britz User type administrator a Pe We clicking one of the option if we can then decide to 1 use numerical classification methods such as different regression methods The obs
7. A typical example is the analysis of which clients of a bank has been given credits We have many observations with credit granted or credit refused and probably a longer list of attributes of the clients age sex income amount of the credit asked for time since being a customer with the bank past bookings Machine learning could be applied to define a set of rules which based on past decisions predict if a credit would be granted for a new application or not Machine learning will in many cases also be able to tell something about the possible error range linked with the decision That could e g allow the banks to make fast decisions in many cases and spend more time on the tricky cases The book by Witten et al 2011 gives many interesting examples Now we can e g see the income changes in each farm types in a simulation compared to the baseline as an outcome we want to predict and their production program and changes in prices and premiums as the attributes used to explain that outcome Some farm types might exhibit very large income changes other little ones What are common characteristics of the one and the other group Machine learning might then come up with a pattern e g based on a regression model which determines the most important attributes impacting income Possible structural Drivers Machine learning has thus a lot of changes in a given simulation Simulation results e g crop shares similari
8. Algorithms from Machine Learning interesting for CAPRI by Wolfgang Britz September 2011 Inhalt Bac NN hrc errs cate secon nsec cio rcv egies E neve noveanasnesedensnosuaaeeeosis 1 Using the CAPRI exploitation tools for systematic results ANALYSIS ccccsseecccesseccceesscceeeeseceseeeeceeseees 2 Macnee NODS ee A E 2 Hapee aO CAPR nema eteeee seinen N R See eT ee ee eee 3 Interaction between CAPRI GUI ANd WEKA cccccccceceesssseeeeccceesaeeeseeeccccessaeesseeecceesesauaaseeeeeeess 4 OS VEY GUI e A E E stone addad qa mudesansdttetcatawersenaaviaamucnereaeaes 6 CLASSIC ACO a PEE ceri seat E E A ua E E ier duce A OE S T AEE 7 OTE e ea ov es get seca ee E E cassie omneaeesea ebownee ee essen eeweee 8 Attribute viewing and selection ssescseresunsesureresursessreresureresrereoureresrererurerensurerureressererurererunrerurerenuns 8 ECI APEE E E E eae eer verso E A E E E PEE west oa ee E NEE EE E NE EEE EE TEN 9 RTOC E A E S E A E A 9 Background A serious challenge for large scale economic models is the dimensionality of the results generated by model runs These reflect the high level of dis aggregation in different dimensions and the many aspects dealt with in these tools such as relating to economic social and environmental indicators A single simulation run with CAPRI based on the farm type modules produces over 20 Mio non zeros Clearly any of these numbers is generated by a computer based model and should he
9. d for our purposes to use under Test options Use training set the default in our implementation as we are typically not interested in an out of sample test of the prediction quality e The actual classification can be started with the start button If the data in the background are updated the actually chosen classifier with the chosen options will be started on the new data set automatically In absence of errors the Classifier output on the RHS will hence typically show results based on the latest selected data e The results can be visualized by clicking with the mouse on an item in the result list the last on in the list always being the newest If one has tried several classifiers the old results remain available However if the data in the background change the old results are automatically removed The reader should note that all the functionality described is from the standard WEKA GUI so that the user manual from WEKA can be used for further information PS The cluster panel is not described it works quite similar Note however that filters are not applied to the cluster see below Filtering Weka Explorer GUI Supply details mapping view 1 Quantile 20 46 17 Classify Filter view and Select Attribute Evaluator CfsSubsetEval Search Method BestFirst D 1 N5 Attribute Selection Mode Attribute selection output a as e E SS A SEAE AES Maini Use full trai
10. ervations in the map define the dependent variable 2 Use the class assigned by the maps input into nominal classification 3 To switch classification off A new window will be opened which shows the WEKA GUI see below The WEKA GUI The classification is based on the complete functionality of the WEKA GUI regarding attribute selection visualization filtering and classification see http www cs waikato ac nz ml index html There are very good manuals available from the site the latest user manual is also available from http www capri model org docs WekaManual 3 6 5 pdf so that only a few major tips are given below for fast start The tabs Classify Cluster Filter and View and select allow the user to access specific part of the WEKA functionality The result set from the current classification run can be shown in the lower left panel result list For each result set a popup Menu opens options e g to show a graph with the prediction errors Classification 2 Weka Explorer GUI Supply details mapping view 1 all Cereals Income 2020 MTR_RDQuantile 12 01 58 Classify Cluster Filter view and Select Classifier Choose LinearRegression 50 R 1 0E 8 Test options Classifier output i D073 FOadEr ACTIVITIES REVENUES F Use training set HeT 0 0085 Fodder activities Income Supplied test set et 0 164 Set aside and fallow land Revenues Ser EE 0 0042 Set aside
11. m income GHG emissions or the nitrogen balance are calculated from thousands of simulated items How can we discover the story behind the results i e which regions activities price or policy changes etc are most important for the aggregate changes communicated The exploitation tools developed for CAPRI with a flexible on the fly approach to produce tables graphs and maps had been an important step to improve the efficiency in exploiting and analyzing results But in parallel CAPRI has grown in scope and scale It might be the time now to consider new approaches to analyze model outcomes Before discussing the integration of machine learning in the exploitation tools we will quickly review the current approaches based on the current exploitation tools Using the CAPRI exploitation tools for systematic results analysis A basic idea when using the CAPRI exploitation tools is go top down from key aggregate results to the underlying drivers The starting point of the analysis can be e g changes in farm management crop shares stocking densities a welfare analysis or environmental impacts at aggregate level From there one can track e g down the changes to specific sectors activities or regions by using more detailed tables or maps These approaches had been presented in several training sessions Recent additions to the GAMS code further support result analysis e Decomposition of aggregate yield changes http www capri model org
12. nce be a non probabilistic outcome depending on the input and the code used Specifically the relation between the input and any single number outputted is determined by the model structure and parameterization and pre and post processing code It must hence be possible to track any change quantitatively back to the shock analyzed But that rather theoretical point of view has very little to do with the task at hand when one has to distill from a set of model outcomes an analysis The questions here are what are the most important results i e salient to the questions underlying the analysis and large enough to matter and how can they be explained For the client the story behind the results is often at least equally important as the results themselves If the story is well told the black box character of the tool is removed and its usefulness in depicting major cause effect relations becomes evident Telling a good and right story requires however often quite some time in analyzing results in a systematic way The user will hence have to decide for which items of the huge data set a thorough analysis of underlying drivers is advisable Limited time and human resources will set tight limits to the extent of such systematic analysis Typically in any report only a few dozen key results perhaps complemented with a few maps showing several hundredths numbers will be presented But these key results such as changes in aggregate welfare far
13. ning set 5 Cross validation Folds 10 Evaluator weka attributesSelection CfissubsetEval Search weka attributeSelection BestFirst D 1 N 5 Relation Supply details mapping view 1 Quantile 20 46 17 Instances 300 Attributes 5 S Revenues Income Result list right click for options Yield 20 50 53 BestFirst CfsSubsetEval Crop share Animal density Class numeric Seed 1 Num Class numeric Evaluation mode evaluate on all training data Attribute Selection on all input data Search Method Best first Start set no attributes Search direction forward Stale search after 5 node expansions Total number of subsets evaluated 13 Merit of best subset found 0 388 Attribute Subset Evaluator supervised Class numeric 5 Class numeric CFS Subset Evaluator Including locally predictive attributes Selected attributes 1 4 2 Revenues Crop share Animal density The filter panel allows running different types of filters which remove attributes in many cases reflecting the correlation between attributes In order to use the result from the filter run click on the result set and chose Use output for classification e Weka Explorer GLI Supply details mapping view 0 all Cereals Income 2020 MTR_RDQuantile 20 26 35 5 Classify Cluster Filter view and Select Attribute Evaluator Choose CfsSubsetEval CFS Subset Evaluator S
14. ry The integration of algorithms from machine learning based on the WEKA library and GUI offers new possibilities to systematic analysis of result sets Thanks to the open source policy of WEKA it was possible to integrate these powerful tools transparently in the CAPRI GUI Depending on the experiences made over the next months further links might be included e g rending clusters in maps References lan H Witten Eibe Frank Mark A Hall 2011 Data Mining Practical Machine Learning Tools and Techniges Third edition Elsevier Amsterdam 630 pages 9 Remco R Bouckaert Eibe Frank Mark Hall Richard Kirkby Peter Reutemann Alex Seewald David Scuse 2011 WEKA Manual for Version 3 6 5 June 28 2011 University of Waikato Hamilton New Zealand 10
15. th the explanatory results In order to switch off the use of the filter select Do not longer use output for classification Attribute viewing and selection The last panel available is especially interesting to quickly analyze statistics of the underlying data 4 Weka Explorer GUI Supply details mapping view 1 Quantile 20 46 17 Classify Filter view and Select Use modified data for classification Filter Choose None Apply Current relation Selected attribute Relation Supply details mapping view 1 Quantile 20 46 17 Name Revenues Type Numeric Instances 300 Attributes 5 Missing 0 0 Distinct 300 Unique 300 100 Attributes Statistic value Minimum 2 076 all None Invert Pattern Maximum 3190 78 Mean 877 02 No Name StdDev 352 113 2 Income 3 Yield 4 Crop share Animal density 5 Class numeric Class Class numeric Num v Visualize All Remove 2 08 1096 43 The reader can manually remove attributes and the reduced set of attributes will then passed to the filter and classifier However the attribute selection is not maintained when new data are loaded The Visualize All button produces graphs of all current attributes 4 oc ot fase Revenues Income Yield 2 08 1096 43 Crop share Animal density 2190 78 414 62 1012 95 5636 93 11269 9 Class numeric 0 14 37 03 Summa
16. ties with statistics in baseline indeed many methods can also be found in statistical packages but the focus to decide upon which attributes and relations matters is Machine shifted to a certain extent from learning the human being to the computer And the tool box used in machine learning differs to a certain degree from classical Rules statistics And not of least many Correlations of the algorithms had also been developed keeping computing time in mind Implementation in CAPRI The implementation in CAPRI is based on the existing exploitations tool of the CAPRI GUI and the WEKA machine learning library Witten et al 2011 which is also integrated into other well known packages such as RapidMiner Thanks to the GNU license including full access to the underlying Java source code it was possible to integrate the functionality of WEKA into the CAPRI exploitation tools Only a few code changes were necessary to pass data from the tables and maps shown in the CAPRI GUI to the WEKA library see below That is done automatically in the background with the aim to reduce user input in the process As a consequence a very powerful set of filtering and classification as well as related visualization tools from machine learning can be applied to the result sets from CAPRI inside the existing exploitation tools The current implementation is based on the interaction of two views 1 A map ora table using classification
17. top Including locally predictive attributes Result list right click For options Selected attributes 1 2 5 6 7 10 17 19 20 21 22 27 31 37 39 15 Ceo eals Revenues Use output for classification als Income Do not longer use output for classification 18 Production per UAAR keds Revenues View in main window keds Income keds Production per UAAR tables and Permanent crops Income tables and Permanent crops Crop share Animal dens Search Method GeneticSearch Z 20 G 20 C 0 6 M 0 033 R 20 5 1 Attribute Selection Mode Attribute selection output Beeta DIIIS u iva as oy ito is iy is eu T ee ot apa ted ditties 0 48977 0 44828 l 2 5 14 17 19 21 22 24 30 31 36 37 39 5 Cross validation Folds 10 0 52524 0 56472 25681417 20 21 22 24 30 31 33 34 37 Seed 1 0 53106 0 58384 2568 1417 19 20 22 27 31 36 37 39 40 i Attribute Subset Evaluator supervised Class numeric 46 Class numeric Num Class numeric X 20 26 51 GeneticSearch View in separate window Save result buffer Delete result buffer cables and Permanent crops Production per UAAR r activities Revenues Visualize reduced data r activities Income Set aside and fallow land Income All cattle activities Revenues Beef meat activities Income Beef meat activities Crop share Animal density m The last selected filter will be automatically restarted if a new data set is implicitly loaded change of the map or of the data in the cluster table wi
18. ue plotted for a region defines the class attribute we want to analyze Any one instance consists of a vector of attributes of which one is the class value i e the value to classify which can be numeric or nominal The other attributes are used for classification or clustering and stem from a second table see below Classification methods which use nominal values can also be sued In that case the class chosen for the region as seen from the color in which is drawn defines the class attributes In our example above each region would fall into one of five classes Next we open a second table with the data we want to use as explanatory attributes The latest trunk comprises the table Supply details cluster view which comprises promising attributes which are possible candidates to explain many changes in a simulation for all activity aggregates crop shares stocking densities revenues income yields A Supply details cluster view 0 Years Scenarios 2020 MTR_RD v Cereals Revenues Cereals Income Cereals Yield Cereals Crop Oilseeds Revenues Oilseeds Income Oilseeds Yield Oilseeds Crop Other arable Euro ha or head Euro ha or head kg or 1 1000 share Animal Euros ha or head Euro ha or head kg or 11000 share Animal Revenues head ha or head density head ha or head density Euro ha or F or 0 01 or 0 01 P animals ha animals ha h European Union 27 818 70 468 75 5524 26

Algorithms from Machine Learning

Contents

Download Pdf Manuals

Related Search

Related Contents