Home

LS-SVMlab Toolbox User's Guide - Esat

1. In some cases one may also be interested in the uncertainty on the prediction for a new observation Xt This type of requirement is fulfilled by the construction of a prediction interval As before pointwise and simultaneous prediction intervals can be found by gt gt pi predlssvm model Xt alpha pointwise and gt gt pi predlssvm model Xt alpha simultaneous respectively We illustrate both type of prediction intervals on the following example Note that the software can also handle heteroscedastic data Also the cilssvm and predlssvm can be called by the functional interface see A 3 9 and A 3 27 m X 3 3 REGRESSION 31 0 7075 r r 7 r r 0 7075 0 7074 0 70747 0 7074F 0 7074 0 7073 x 0 7073 0 7073 0 7073 F 0 7072 0 7072 0 7072 0 7071 1 1 1 1 1 1 0 7072 1 1 1 1 1 1 90 95 100 105 110 115 120 125 90 95 100 105 110 115 120 125 X X a b Figure 3 8 a Fossil data with two pointwise 95 confidence intervals b Simultaneous and pointwise 95 confidence intervals The outer inner region corresponds to simultaneous point wise confidence intervals The full line in the middle is the estimated LS SVM model For illustration purposes the 95 pointwise confidence intervals are connected gt gt X linspace 5 5 200 gt gt Y sin X sqrt 0 05 X 2 0 01 randn 200 1 gt gt model initlssvm X Y f
2. 3 2 1 Hello world A simple example shows how to start using the toolbox for a classification task We start with constructing a simple example dataset according to the correct formatting Data are represented as matrices where each row of the matrix contains one datapoint gt gt X 2 rand 100 2 1 gt gt Y sign sin X 1 X 2 gt gt X 17 18 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES X 0 9003 0 9695 0 5377 0 4936 0 2137 0 1098 0 0280 0 8636 0 7826 0 0680 0 5242 0 1627 0 4556 0 7073 0 6024 0 1871 gt gt Y Y 1 1 1 1 1 1 1 1 In order to make an LS SVM model with Gaussian RBF kernel we need two tuning parame ters y gam is the regularization parameter determining the trade off between the training error minimization and smoothness In the common case of the Gaussian RBF kernel o sig2 is the squared bandwidth gt gt gam 10 gt gt sig2 0 4 gt gt type classification gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel The parameters and the variables relevant for the LS SVM are passed as one cell This cell allows for consistent default handling of LS SVM parameters and syntactical grouping of related arguments This definition should be used consistently throughout the use of that LS SVM model The corresponding object oriented interface to LS SVMlab leads to shorter function calls see demomodel By default the data are preprocessed b
3. encode for training gt gt model initlssvm X Y classifier gam sig2 gt gt model changelssvm model codetype code_MOC gt gt model changelssvm model gt codedist_fct codedist_hamming gt gt model codelssvm model implicitly called by next command gt gt model trainlssvm model gt gt plotlssvm model1 decode for simulating gt gt model changelssvm model gt codedist_fct codedist_bay gt gt model changelssvm model codedist_args bay_modoutClass model Xt simlssvm model Xt gt gt Yt 70 APPENDIX A MATLAB FUNCTIONS Full syntax We denote the number of used binary classifiers by nbits and the number of different represented classes by nc e For encoding gt gt Yc codebook old_codebook code Y codefct gt gt Yc codebook old_codebook code Y codefct codefct_args gt gt Yc code Y given_codebook Outputs Yc Nxnbits encoded output classifier codebook nbits nc matrix representing the used encoding o1d_codebook d nc matrix representing the original encoding Inputs Y Nxd matrix representing the original classifier codefct Function to generate a new codebook e g code_MOC codefct_args Extra arguments for codefct given_codebook nbits nc matrix representing the encoding to use e For decoding gt gt Yd code Yc codebook old_codebook gt gt Yd code Yc codebook old_
4. gt gt cost costs rcrossvalidate X Y type gam sig2 kernel preprocess L gt gt cost costs rcrossvalidate X Y type gam sig2 kernel preprocess L wfun estfct gt gt cost costs rcrossvalidate X Y type gam sig2 kernel preprocess L wfun estfct combinefct Outputs cost Cost estimation of the robust L fold cross validation costs Lx1 vector with costs estimated on the L different folds Inputs X Training input data used for defining the LS SVM and the preprocessing Y Training output data used for defining the LS SVM and the preprocess ing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original L Number of folds by default 10 wfun weighting scheme by default whuber estfct Function estimating the cost based on the residuals by default mse combinefct Function combining the estimated costs on the different folds by default mean e Using the object oriented interface gt gt cost costs rcrossvalidate model gt gt cost costs rcrossvalidate model L gt gt cost costs rcrossvalidate model L wfun gt gt cost costs rcerossvalidate model L wfun estfct A 3 ALPHABETICAL LIST OF FUNCTION CALLS 97 gt gt cost costs rcrossvalidate model L wfun
5. A 3 ALPHABETICAL LIST OF FUNCTION CALLS 107 cost Figure A 1 This figure shows the grid which is optimized given the limit values 4 3 2 3 Full syntax e Optimization by exhaustive search over a two dimensional grid gt gt Xopt Yopt Evaluations fig gridsearch fun startvalues funargs optioni value1 Outputs Xopt Optimal parameter set Yopt Criterion evaluated at Xopt Evaluations Used number of iterations fig Handle to the figure of the optimization Inputs CostFunction Function implementing the cost criterion StartingValues 2 d matrix with limit values of the widest grid FunArgs Cell with optional extra function arguments of fun option The name of the option one wants to change value The new value of the option one wants to change 108 The different options APPENDIX A MATLAB FUNCTIONS Nofigure figure or nofigure MaxFunEvals Maximum number of function evaluations de fault 100 GridReduction grid reduction parameter e g 2 small re duction 10 heavy reduction default 5 TolFun Minimal toleration of improvement on func tion value default 0 0001 TolX Minimal toleration of improvement on X value default 0 0001 Grain Square root number of function evaluations in one grid default 10 e Optimization by exhaustive search of linesearch gt gt Xopt Yopt Evaluations fig Outputs Xop
6. Estimate the performance of a trained model with leave one out crossvalidation CAUTION Use this function only to obtain the value of the leave one out crossvalidation score function given the tuning parameters Do not use this function together with tunelssum but use Leaveoneoutlssuminstead The latter is a faster implementation based on one full matrix inverse Basic syntax gt gt leaveoneout X Y type gam sig2 gt gt leaveoneout model Description In each iteration one leaves out one point and fits a model on the other data points The performance of the model is estimated based on the point left out This procedure is repeated for each data point Finally all the different estimates of the performance are combined default by computing the mean The assumption is made that the input data is distributed independent and identically over the input space Full syntax e Using the functional interface for the LS SVMs gt gt cost leaveoneout X Y type gam sig2 kernel preprocess gt gt cost leaveoneout X Y type gam sig2 kernel preprocess estfct combinefct Outputs cost Cost estimated by leave one out crossvalidation Inputs X Training input data used for defining the LS SVM and the preprocessing Y Training output data used for defining the LS SVM and the preprocess ing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for lin
7. RBF_kernel o gt gt model tunelssvm model simplex crossvalidatelssvm 10 mae gt gt Xt linspace 4 5 4 7 200 Figure 3 9 shows the 95 pointwise and simultaneous prediction intervals on the test set Xt As expected the simultaneous intervals are again much wider than pointwise intervals N oO A A Q 0 X Figure 3 9 Pointwise and simultaneous 95 prediction intervals for the above given data The outer inner region corresponds to simultaneous pointwise prediction intervals The full line in the middle is the estimated LS SVM model For illustration purposes the 95 pointwise prediction intervals are connected 32 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES As a final example consider the Boston Housing data set multivariate example We selected randomly 338 training data points and 168 test data points The corresponding simultaneous confidence and prediction intervals are shown in Figure and Figure respectively The outputs on training as well as on test data are sorted and plotted against their correspond ing index Also the respective intervals are sorted accordingly For illustration purposes the simultaneous confidence prediction intervals are not connected gt gt load full data set X and Y gt gt sel randperm 506 gt gt gt gt Construct test data gt gt Xt X sel 1 168 gt gt Yt Y sel 1 168 gt gt gt gt tra
8. Scholkopf B Burges C Smola A Eds 1998 Advances in Kernel Methods Support Vector Learning MIT Press Scholkopf B Smola A J M ller K R 1998 Nonlinear component analysis as a kernel eigenvalue problem Neural Computation 10 1299 1319 Sch lkopf B Smola A 2002 Learning with Kernels MIT Press Smola A J Schdlkopf B 2000 Sparse greedy matrix approximation for machine learning Proc 17th International Conference on Machine Learning 911 918 San Francisco Morgan Kaufman Stone M 1974 Cross validatory choice and assessment of statistical predictions J Royal Statist Soc Ser B 36 111 147 Suykens J A K Vandewalle J 1999 Least squares support vector machine classifiers Neural Processing Letters 9 3 293 300 Suykens J A K Vandewalle J 2000 Recurrent least squares support vector machines IEEE Transactions on Circuits and Systems I 47 7 1109 1114 Suykens J A K De Brabanter J Lukas L Vandewalle J 2002 Weighted least squares support vector machines robustness and sparse approximation Neurocomputing Special issue on fundamental and information processing aspects of neurocomputing 48 1 4 85 105 BIBLIOGRAPHY 115 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 5l 52 Suykens J A K Vandewalle J amp De Moor B 2001 Intelligence and cooperative
9. Number of lags of X in new input Number of lags of Y in new input Number of future steps of Y in new output by default 1 windowizeNARX predict trainlssvm simlssvm 112 APPENDIX A MATLAB FUNCTIONS Bibliography 1 10 11 12 13 Alzate C and Suykens J A K 2008 Kernel Component Analysis using an Epsilon Insen sitive Robust Loss Function IEEE Transactions on Neural Networks 19 9 1583 1598 Alzate C and Suykens J A K 2010 Multiway Spectral Clustering with Out of Sample Extensions through Weighted Kernel PCA IEEE Transactions on Pattern Analysis and Machine Intelligence 32 2 335 347 Baudat G Anouar F 2001 Kernel based methods and function approximation in Inter national Joint Conference on Neural Networks IJCNN 2001 Washington DC USA 1244 1249 Cawley G C Talbot N L C 2002 Efficient formation of a basis in a kernel induced feature space in Proc European Symposium on Artificial Neural Networks ESANN 2002 Brugge Belgium 1 6 Cristianini N Shawe Taylor J 2000 An Introduction to Support Vector Machines Cam bridge University Press De Brabanter J Pelckmans K Suykens J A K Vandewalle J 2002 Robust cross validation score function for LS SVM non linear function estimation International Con ference on Artificial Neural Networks ICANN 2002 Madrid Spain Madrid Spain Aug 2002 713 719 De Brabanter K Pelckmans K
10. Outputs sig_e Nt x1 vector with the o error bars of the test data model Object oriented representation of the LS SVM model bay Object oriented representation of the results of the Bayesian inference Inputs model Object oriented representation of the LS SVM model Xt Ntxd matrix with the inputs of the test data etype svd eig eigs or eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation See also bay_lssvm bay_optimize bay_modoutClass plotlssvm 56 APPENDIX A MATLAB FUNCTIONS A 3 3 bay_initlssvm Purpose Initialize the tuning parameters y and o before optimization with bay_optimize Basic syntax gt gt gam sig2 bay_initlssvm X Y type gt gt model bay_initlssvm model1 Description A starting value for is only given if the model has kernel type RBF_kernel Full syntax e Using the functional interface gt gt gam sig2 bay_initlssvm X Y type kernel Outputs gam Proposed initial regularization parameter sig2 Proposed initial RBF_kernel parameter Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c kernel Kernel type by default RBF_kernel e Using the object oriented interface gt gt model bay_initlssvm model1 Outputs model Object orient
11. Support Vectors Figure 3 16 An example of a binary classifier Ripley data set obtained by application of a fixed size LS SVM 20 support vectors on a classification task 40 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES 3 4 Unsupervised learning using kernel principal compo nent analysis A simple example shows the idea of denoising in the input space by means of kernel PCA The demo can be called by gt gt demo_yinyang and uses the routine preimage_rbf m which is a fixed point iteration algorithm for computing pre images in the case of RBF kernels The pseudo code is shown as follows gt gt load training data in Xtrain and test data in Xtest gt gt dim size Xtrain 2 gt gt nb_pcs 4 gt gt factor 0 25 gt gt sig2 factor dim xmean var Xtrain A rule of thumb for sig2 gt gt lam U kpca Xtrain RBF_kernel sig2 Xtest eigs nb_pcs The whole dataset is denoised by computing approximate pre images gt gt Xd preimage_rbf X sig2 U Xtrain Xtest d Figure 3 17 shows the original dataset in gray and the denoised data in blue o Note that the denoised data points preserve the underlying nonlinear structure of the data which is not the case in linear PCA Denoising 0 by computing an approximate pre image 2 57 Figure 3 17 Denoised data o obtained by reconstructing the data points using 4 kernel principal components
12. simplex fun X opt A 3 ALPHABETICAL LIST OF FUNCTION CALLS 109 e The different options See also trainlssvm crossvalidate opts opts opts opts opts opts opts opts opts Chi Delta Gamma Rho Sigma MaxIter MaxFunEvals TolFun TolX Parameter governing expansion steps default 2 Parameter governing size of initial simplex default 1 2 Parameter governing contraction steps de fault 0 5 Parameter governing reflection steps default 1 Parameter governing shrinkage steps default 0 5 Maximum number of optimization steps de fault 15 Maximum number of function evaluations de fault 25 Stopping criterion based on the relative change in value of the function in each step default 1e 6 Stopping criterion based on the change in the minimizer in each step default 1e 6 110 APPENDIX A MATLAB FUNCTIONS A 3 37 windowize amp windowizeNARX Purpose Re arrange the data points into a block Hankel matrix for N AR X time series modeling Basic Syntax gt gt w windowize A window gt gt Xw Yw windowizeNARX X Y xdelays ydelays steps Description Use windowize function to make a nonlinear AR predictor with a nonlinear regressor The last elements of the resulting matrix will contain the future values of the time series the others will contain the past inputs window is the relative index of data points in matrix A that are selected t
13. alpha conftype predlssvm model Xt alpha conftype gt gt pi Description This function calculates bias corrected 100 1 a pointwise or simultaneous prediction intervals The procedure support homoscedastic data sets as well heteroscedastic data sets The construction of the prediction intervals are based on the central limit theorem for linear smoothers combined with bias correction and variance estimation Full syntax e Using the functional interface gt gt pi predlssvm X Y type gam kernel_par kernel preprocess Xt gt gt pi predlssvm X Y type gam kernel_par kernel preprocess Xt alpha gt gt pi predlssvm X Y type gam kernel_par kernel preprocess Xt alpha conftype Outputs pi N x 2 matrix containing the lower and upper prediction intervals Inputs X Training input data used for defining the LS SVM and preprocessing Y Training output data used for defining the LS SVM and preprocessing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original Xt Test points where prediction intervals are calculated alpha Significance level by default 5 conftype Type of prediction interval pointwise or simultaneous by default si multaneous e Using the object oriented interface gt gt
14. 0 2 1220 0 0 The corresponding ROC curve is shown on Figure 3 3b RBF LS SVM 44 7704 0 1 2557 with 2 different classes Receiver Operating Characteristic curve area 0 96403 std 0 009585 1 T Classifier class1 o class2 Sensitivity o o o o o o o o N wo A a D N o 0 4 0 6 0 8 1 1 0 5 0 0 5 X 1 Specificity a Original Classifier b ROC Curve Figure 3 3 ROC curve of the Ripley classification task a Original LS SVM classifier b Receiver Operating Characteristic curve 3 2 CLASSIFICATION 21 3 2 3 Using the object oriented interface initlssvm Another possibility to obtain the same results is by using the object oriented interface This goes as follows gt gt load dataset gt gt gateway to the object oriented interface gt gt model initlssvm X Y type RBF_kernel gt gt model tunelssvm model simplex crossvalidatelssvm L_fold misclass gt gt model trainlssvm model gt gt plotlssvm model1 gt gt latent variables are needed to make the ROC curve gt gt Y_latent latentlssvm model X gt gt area se thresholds oneMinusSpec Sens roc Y_latent Y 3 2 4 LS SVM classification only one command line away The simplest way to obtain an LS SVM model goes as follows binary classification problems and one versus one encoding for multiclass gt gt load dataset gt gt type classificati
15. 0 0690 pre_xstd 1 8282 pre_ymean 0 2259 pre_ystd 0 3977 status changed weights Training simulation and making a plot is executed by the following calls gt gt model trainlssvm model gt gt Xt normrnd 0 2 150 1 gt gt Yt simlssvm model Xt gt gt plotlssvm model1 The second level of inference of the Bayesian framework can be used to optimize the regular ization parameter gam For this case a Nystr m approximation of the 20 principal eigenvectors is used 30 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES gt gt model bay_optimize model 2 eign 50 Optimization of the cost associated with the third level of inference gives an optimal kernel parameter For this procedure it is recommended to initiate the starting points of the kernel parameter This optimization is based on Matlab s optimization toolbox It can take a while gt gt model bay_initlssvm mode1 gt gt model bay_optimize model 3 eign 50 3 3 5 Confidence Predition Intervals for Regression Consider the following example Fossil data set gt gt Load data set X and Y Initializing and tuning the parameters gt gt model initlssvm X Y f 0 RBF_kernel o gt gt model tunelssvm model simplex crossvalidatelssvm 10 mse Bias corrected approximate 100 1 a pointwise confidence intervals on the estimated LS SVM model can then be obtained by using the co
16. CALLS 59 A 3 5 bay_lssvmARD Purpose Bayesian Automatic Relevance Determination of the inputs of an LS SVM Basic syntax gt gt dimensions bay_lssvmARD X Y type gam sig2 gt gt dimensions bay_lssvmARD model Description For a given problem one can determine the most relevant inputs for the LS SVM within the Bayesian evidence framework To do so one assigns a different weighting parameter to each dimension in the kernel and optimizes this using the third level of inference According to the used kernel one can remove inputs based on the larger or smaller kernel parameters This routine only works with the RBF_kernel with a sig2 per input In each step the input with the largest optimal sig2 is removed backward selection For every step the generalization performance is approximated by the cost associated with the third level of Bayesian inference The ARD is based on backward selection of the inputs based on the sig2s corresponding in each step with a minimal cost criterion Minimizing this criterion can be done by continuous or by discrete The former uses in each step continuous varying kernel parameter optimization the latter decides which one to remove in each step by binary variables for each component this can only be applied for rather low dimensional inputs as the number of possible combinations grows exponentially with the number of inputs If working with the RBF_kernel the kernel par
17. De Brabanter J Debruyne M Suykens J A K Hubert M De Moor B 2009 Robustness of Kernel Based Regression a Comparison of Iterative Weighting Schemes Proc of the 19th International Conference on Artificial Neural Networks ICANN Limassol Cyprus September 100 110 De Brabanter K De Brabanter J Suykens J A K De Moor B 2010 Optimized Fixed Size Kernel Models for Large Data Sets Computational Statistics amp Data Analysis 54 6 1484 1504 De Brabanter K De Brabanter J Suykens J A K De Moor B 2010 Approximate Con fidence and Prediction Intervals for Least Squares Support Vector Regression IEEE Trans actions on Neural Networks 22 1 110 120 Evgeniou T Pontil M Poggio T 2000 Regularization networks and support vector machines Advances in Computational Mathematics 13 1 1 50 Fawcett T 2006 An Introduction to ROC analysis Pattern Recogniction Letters 27 861 874 Girolami M 2002 Orthogonal series density estimation and the kernel eigenvalue prob lem Neural Computation 14 3 669 688 Golub G H and Van Loan C F 1989 Matrix Computations Johns Hopkins University Press Baltimore MD 113 114 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 BIBLIOGRAPHY Gyorfi L Kohler M Krzy ak A Walk H 2002 A Distribution Free Theory of Nonpara metri
18. L_fold mae wmyriad gt gt model robustlssvm model gt gt plotlssvm model1 RBF RBF Y 35 1583 5 0 090211 function estimation using LS SVM 9 18012 0 0 93345 80 T 4 LS SVM LS SVM lt data 3 data Real function Real function function estimation using LS SVM 60 40 20 E EEN IE E A AAN T LEE ANAE al 20 40 5 0 0 X X a b Figure 3 12 Experiments on a noisy sinc dataset with extreme outliers a Application of the standard training and tuning parameter selection techniques b Application of an iteratively reweighted LS SVM training myriad weights together with a robust cross validation score func tion which enhances the test set performance 3 3 REGRESSION 35 Table 3 1 Definitions for the Huber Hampel Logistic and Myriad with parameter R weight functions W The corresponding loss L and score function 7 are also given Logistic Myriad 1 if r lt 8 1 if r lt b1 tanh r 82 62 4 r2 bo r if r gt B bo by is b lt Ir lt b2 0 if r gt be En OES if r lt b1 if b lt r lt be rtanh r log 6 r if r gt be r if r lt barir Ba te if r gt 8 b2 b1 3 3 7 Multiple output regression In the case of multiple output data one can
19. Ntxd matrix with the inputs of the test data Prior knowledge of the balancing of the training data or gt svd eig eigs or eign Number of eigenvalues eigenvectors used in the eigenvalue decomposi tion approximation bay_lssvm bay_optimize bay_errorbar ROC A 3 ALPHABETICAL LIST OF FUNCTION CALLS 63 A 3 7 bay_optimize Purpose Optimize the posterior probabilities of model hyper parameters with respect to the different levels in Bayesian inference Basic syntax One can optimize on the three different inference levels as described in section 2 1 3 e First level In the first level one optimizes the support values a s and the bias b e Second level In the second level one optimizes the regularization parameter gam e Third level In the third level one optimizes the kernel parameter In the case of the common RBF_kernel the kernel parameter is the bandwidth sig2 This routine is only tested with Matlab R2008a R2008b R2009a R2009b and R2010a using the corresponding optimization toolbos Full syntax e Outputs on the first level gt gt model alpha b gt gt model alpha b bay_optimize X Y type gam sig2 kernel preprocess 1 bay_optimize model 1 With model Object oriented representation of the LS SVM model optimized on the first level of inference alpha Support values optimized on the first level of inference b Bias term optimized on the first level of infer
20. as follows First for every kernel first Coupled Simulated Annealing CSA determines suitable starting points for every method The search limits of the CSA method are set to exp 10 exp 10 Second these starting points are then given to on of the three optimization routines above These routines have to be explicitly specified by the user CSA have already proven to be more effective than multi start gradient descent optimization Another advantage of CSA is that it uses the acceptance temperature to control the variance of the acceptance probabilities with a control scheme This leads to an improved optimization efficiency because it reduces the sensitivity of the algorithm to the initialization parameters while guiding the optimization process to quasi optimal runs By default CSA uses five multiple starters The tuning parameters are the regularization parameter gam and the squared kernel parame ter or sig2 in the case of the RBF_kernel costfun gives an estimate of the performance of the model Possible functions for costfun are crossvalidatelssvm leaveoneoutlssvm rcrossvalidatelssvm and gcrossvalidatelssvm Possible combinations are gt gt model tunelssvm model simplex crossvalidatelssvm 10 mse gt gt model tunelssvm model gridsearch crossvalidatelssvm 10 mse gt gt model tunelssvm model linesearch crossvalidatelssvm 10 mse In the robust cro
21. case of non Gaussian noise or outliers Basic syntax gt gt model robustlssvm model Robustness towards outliers can be achieved by reducing the influence of support values cor responding to large errors One should first use the function tunelssvm so all the necessary parameters are optimally tuned before calling this routine Note that the function robustlssvm only works with the object oriented interface Full syntax e Using the object oriented interface gt gt model robustlssvm model Outputs model Robustly trained object oriented representation of the LS SVM model Inputs model Object oriented representation of the LS SVM model See also trainlssvm tunelssvm rcrossvalidate 100 APPENDIX A MATLAB FUNCTIONS A 3 33 roc Purpose Receiver Operating Characteristic ROC curve of a binary classifier Basic syntax gt gt area se thresholds oneMinusSpec sens TN TP FN FP roc Zt Y Description The ROC curve LI shows the separation abilities of a binary classifier by setting different possible classifier thresholds the data set is tested on misclassifications 16 As a result a plot is shown where the various outcomes are described If the plot has an area under the curve of 1 on test data a perfectly separating classifier is found on that particular dataset if the area equals 0 5 the classifier has no discriminative power at all In general this function can be called with the latent v
22. gam sig2 RBF_kernel preprocess alpha b All plotting is done with this simple command It looks for the best way of displaying the result Figure 3 6 3 3 2 LS SVM regression only one command line away As an alternative one can use the one line 1ssvm command gt gt type function estimation gt gt Yp lssvm X Y type By default the Gaussian RBF kernel is used Further information can be found in A 3 24 roai RBF function estimation using LS SVM 21 1552 0 0 17818 2 5 r r Figure 3 6 Simple regression problem The solid line indicates the estimated outputs the dotted line represents the true underlying function The dots indicate the training data points 28 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES 3 3 3 Bayesian Inference for Regression An example on the sinc data is given gt gt type function approximation gt gt X linspace 2 2 2 2 250 gt gt Y sinc X normrnd 0 1 size X 1 1 gt gt Yp alpha b gam sig2 lssvm X Y type The errorbars on the training data are computed using Bayesian inference gt gt sig2e bay_errorbar X Y type gam sig2 figure See Figure 3 7 for the resulting error bars RBF 5 Y 79 9993 o 1 3096 LS SVM and its 68 10 and 95 20 error bands 1 5 fi fi fi fi fi f fi fi fi Figure 3 7 This figure gives the 68 errorbars green dotted and green dashed dotted line and the 9
23. introduced within the context of statistical learning theory and structural risk minimization In the methods one solves convex optimization problems typically quadratic programs Least Squares Support Vector Machines LS SVM are reformulations to standard SVMs which lead to solving linear KKT systems LS SVMs are closely related to regularization networks and Gaussian processes but additionally emphasize and exploit primal dual interpretations Links between kernel versions of classical pattern recognition algorithms such as kernel Fisher discriminant analysis and extensions to unsupervised learning recurrent networks and control are available Robustness sparseness and weightings can be imposed to LS SVMs where needed and a Bayesian framework with three levels of inference has been developed 44 LS SVM alike primal dual formulations are given to kernel PCA I kernel CCA and kernel PLS 38 For very large scale problems and on line learning a method of Fixed Size LS SVM is proposed 8 based on the Nystr m approximation with active selection of support vectors and estimation in the primal space The methods with primal dual representations have also been developed for kernel spectral clustering 2 data visualization 39 dimensionality reduction and survival analysis The present LS SVMIlab toolbox User s Guide contains Matlab implementations for a number of LS SVM algorithms related to classification regression time series prediction and
24. is discretised for making the final decisions The Receiver Operating Characteris tic curve roc can be used to measure the performance of a classifier Multiclass classification problems are decomposed into multiple binary classification tasks 45 Several coding schemes can be used at this point minimum output one versus one one versus all and error correcting coding schemes To decode a given result the Hamming distance loss function distance and Bayesian decoding can be applied A correction of the bias term can be done which is especially interesting for small data sets 2 1 2 Tuning and robustness Function calls tunelssvm crossvalidatelssvm leaveoneoutlssvm robustlssvm Demos Subsections 3 2 2 3 2 6 3 3 6 3 3 8 demofun democlass demomodel A number of methods to estimate the generalization performance of the trained model are included For classification the rate of misclassifications misclass can be used Estimates based on repeated training and validation are given by crossvalidatelssvm and leaveoneoutlssvm A robust crossvalidation based on iteratively reweighted LS SVM score function 7 6 is called by rcrossvalidatelssvm In the case of outliers in the data corrections to the support values will improve the model robust1ssvm 34 These performance measures can be used to determine the tuning parameters e g the regularization and kernel parameters of the LS SVM tunelssvm In this version the tu
25. matrix with principal eigenvectors Inputs X Nxd matrix with data points used for finding the principal components kernel Kernel type e g RBF_kernel kernel_par Kernel parameter s for linear kernel use Xt Ntxd matrix with noisy points if not specified X is denoised instead etype eig svd eigs eign nb Number of principal components used in approximation e gt gt Xd denoise_kpca X U lam kernel kernel_par Xt Outputs Xd Nxd Nt xd matrix with denoised data X Xt Inputs X Nxd matrix with data points used for finding the principal components U Nxnb Ntxd matrix with principal eigenvectors lam nbx1 vector with eigenvalues of principal components kernel Kernel type e g RBF_kernel kernel_par Kernel parameter s for linear kernel use Xt Ntxd matrix with noisy points if not specified X is denoised instead See also kpca kernel_matrix RBF_kernel A 3 ALPHABETICAL LIST OF FUNCTION CALLS 79 A 3 14 eign Purpose Find the principal eigenvalues and eigenvectors of a matrix with Nystrom s low rank approximation method Basic syntax gt gt D eign A nb gt gt V D eign A nb Description In the case of using this method for low rank approximation and decomposing the kernel matrix one can call the function without explicit construction of the matrix A gt gt D eign X kernel kernel_par nb gt gt V D eign X kernel kernel_par nb
26. of the LS SVM model field Field of the model that one wants to change e g preprocess value New value of the field of the model that one wants to change See also trainlssvm initlssvm simlssvm plotlssvm 80 APPENDIX A MATLAB FUNCTIONS A 3 17 kentropy Purpose Quadratic Renyi Entropy for a kernel based estimator Basic syntax Given the eigenvectors and the eigenvalues of the kernel matrix the entropy is computed by gt gt H kentropy X U lam The eigenvalue decomposition can also be computed or approximated implicitly gt gt H kentropy X kernel sig2 Full syntax e gt gt H kentropy X kernel kernel_par gt gt H kentropy X kernel kernel_par etype gt gt H kentropy X kernel kernel_par etype nb Outputs H Quadratic Renyi entropy of the kernel matrix Inputs X Nxd matrix with the training data kernel Kernel type e g RBF_kernel kernel_par Kernel parameter s for linear kernel use etype eig eigs eign nb Number of eigenvalues eigenvectors used in the eigenvalue decomposi tion approximation e gt gt H kentropy X U lam Outputs H Quadratic Renyi entropy of the kernel matrix Inputs X Nxd matrix with the training data U Nxnb matrix with principal eigenvectors lam nbx1 vector with eigenvalues of principal components See also kernel_matrix demo_fixedsize RBF_kernel A 3 ALPHABETICAL LIST OF FUNCTION CALLS 81 A 3 18 kernel_
27. original alpha Support values obtained from training b Bias term obtained from training Xt Nt xd inputs of the test data e Using the object oriented interface gt gt Yt Zt model simlssvm model Xt Outputs Yt Nt xm matrix with predicted output of test data Zt Nt xm matrix with predicted latent variables of a classifier model Object oriented representation of the LS SVM model Inputs model Object oriented representation of the LS SVM model Xt Ntxd matrix with the inputs of the test data See also trainlssvm initlssvm plotlssvm code changelssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 103 A 3 35 trainlssvm Purpose Train the support values and the bias term of an LS SVM for classification or function approxi mation Basic syntax gt gt alpha b trainlssvm X Y type gam kernel_par kernel preprocess gt gt model trainlssvm model Description type can be classifier or function estimation these strings can be abbreviated into c or f respectively X and Y are matrices holding the training input and training output The i th data point is represented by the i th row X i and Y i gam is the regularization parameter for gam low minimizing of the complexity of the model is emphasized for gam high fitting of the training data points is stressed kernel_par is the parameter of the kernel in the common case of an RBF kernel a large sig2 indicates a stronger smo
28. pi predlssvm model gt gt pi predlssvm model Xt alpha gt gt pi predlssvm model Xt alpha conftype Outputs pi N x 2 matrix containing the lower and upper prediction intervals Inputs model Object oriented representation of the LS SVM model alpha Significance level by default 5 conftype Type of prediction interval pointwise or simultaneous by default si multaneous See also trainlssvm simlssvm cilssvm 94 APPENDIX A MATLAB FUNCTIONS A 3 28 preimage_rbf Purpose Reconstruction or denoising after kernel PCA with RBF kernels i e to find the approximate pre image in the input space of the corresponding feature space expansions Basic syntax gt gt Xdtr preimage_rbf Xtr sig2 U denoising on training data Description This method uses a fixed point iteration scheme to obtain approximate pre images for RBF kernels only Denoising a test set Xnoisy can be done using gt gt Xd preimage_rbf Xtr sig2 U Xnoisy d and for reconstructing feature space expansions gt gt Xr preimage_rbf Xtr sig2 U projections r Full syntax e gt gt Ximg preimage_rbf Xtr sig2 U B type gt gt Ximg preimage_rbf Xtr sig2 U B type npcs gt gt Ximg preimage_rbf Xtr sig2 U B type npcs maxIts Outputs Ximg Nxd Ntxd matrix with reconstructed or denoised data Inputs Xtr Nxd matrix with training data points used for finding the principal com ponents
29. representation of the results of the Bayesian inference e Outputs on the third level gt gt costL3 bay gt gt costL3 bay bay_lssvm X Y type gam sig2 kernel preprocess 3 etype nb bay_lssvm model 3 etype nb With costL3 Cost proportional to the posterior on the third level bay Object oriented representation of the results of the Bayesian inference e Inputs using the functional interface gt gt bay_lssvm X Y type gam sig2 kernel preprocess level etype nb X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original level 1 2 3 etype svd eig eigs eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation e Inputs using the object oriented interface gt gt bay_lssvm model level etype nb model Object oriented representation of the LS SVM model level 1 2 3 etype svd eig eigs eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation See also bay_lssvmARD bay_optimize bay_modoutClass bay_errorbar A 3 ALPHABETICAL LIST OF FUNCTION
30. sig2 parameter of the RBF kernel U Nxnpcs matrix of principal eigenvectors B for reconstruction B are the projections for denoising B is the Ntxd matrix of noisy data If B is not specified then Xtr is denoised instead type reconstruct or denoise npcs number of PCs used for approximation maxIts maximum iterations allowed 1000 by default See also denoise_kpca kpca kernel_matrix RBF_kernel A 3 ALPHABETICAL LIST OF FUNCTION CALLS 95 A 3 29 prelssvm postlssvm Purpose Pre and postprocessing of the LS SVM Description These functions should only be called by trainlssvm or by simlssvm At first the preprocessing assigns a label to each input and output component a for categorical b for binary variables or c for continuous According to this label each dimension is rescaled e continuous zero mean and unit variance e categorical no preprocessing e binary labels 1 and 1 Full syntax Using the object oriented interface e Preprocessing gt gt model prelssvm model gt gt Xp prelssvm model Xt gt gt empty Yp prelssvm model Yt gt gt Xp Yp prelssvm model Xt Yt Outputs model Preprocessed object oriented representation of the LS SVM model Xp Ntxd matrix with the preprocessed inputs of the test data Yp Nt xd matrix with the preprocessed outputs of the test data Inputs model Object oriented representation of the LS SVM model Xt Nt xd matrix with the inputs o
31. taken over this dimension while the other inputs remain constant 0 Full syntax e Using the functional interface gt gt plotlssvm X Y type gam sig2 kernel preprocess alpha b gt gt plotlssvm X Y type gam sig2 kernel preprocess alpha b grain gt gt plotlssvm X Y type gam sig2 kernel preprocess alpha b grain seldims gt gt plotlssvm X Y type gam sig2 kernel preprocess gt gt plotlssvm X Y type gam sig2 kernel preprocess grain gt gt plotlssvm X Y type gam sig2 kernel preprocess grain seldims Inputs X Y type gam sig2 kernel preprocess alpha grain seldims Nxd matrix with the inputs of the training data Nx1 vector with the outputs of the training data function estimation f or classifier c Regularization parameter Kernel parameter s for linear kernel use Kernel type by default RBF_kernel preprocess or original Support values obtained from training Bias term obtained from training The grain of the grid evaluated to compose the surface by default 50 The principal inputs one wants to span a grid by default 1 2 e Using the object oriented interface gt gt model plotlssvm model gt gt model plotlssvm model grain gt gt model plotlssvm model grain seldims Outputs mode1 Trained object oriented representation of the LS SVM model Inputs model Object oriented re
32. the cost based on the residuals by default mse combinefct Function combining the estimated costs on the different folds by default 8 mean 72 APPENDIX A MATLAB FUNCTIONS e Using the object oriented interface gt gt gt gt gt gt gt gt cost costs cost costs cost costs cost costs Outputs cost costs Inputs See also model L estfct combinefct crossvalidate model crossvalidate model L crossvalidate model L estfct crossvalidate model L estfct combinefct Cost estimation of the L fold cross validation Lx1 vector with costs estimated on the L different folds Object oriented representation of the LS SVM model Number of folds by default 10 Function estimating the cost based on the residuals by default mse Function combining the estimated costs on the different folds by default mean leaveoneout gcrossvalidate trainlssvm simlssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 73 A 3 12 deltablssvm Purpose Bias term correction for the LS SVM classifier Basic syntax gt gt model deltablssvm model b_new Description This function is only useful in the object oriented function interface Set explicitly the bias term b_new of the LS SVM model Full syntax gt gt model deltablssvm model b_new Outputs model Object oriented representation of the LS SVM model with initial tuning parameters Inputs model Object oriented representa
33. the matrix grows with the number of data points Hence one needs 2 2 NARX MODELS AND PREDICTION 15 approximation techniques to handle large datasets It is known that mainly the principal eigenval ues and corresponding eigenvectors are relevant Therefore iterative approximation methods such as the Nystr m method are included which is also frequently used in Gaussian processes Input selection can be done by Automatic Relevance Determination bay_lssvmARD 42 In a backward variable selection the third level of inference of the Bayesian framework is used to infer the most relevant inputs of the problem 2 2 NARX models and prediction Function calls predict windowize Demo Subsection 3 3 8 Extensions towards nonlinear NARX systems for time series applications are available 38 A NARX model can be built based on a nonlinear regressor by estimating in each iteration the next output value given the past output and input measurements A dataset is converted into a new input the past measurements and output set the future output by windowize and windowizeNARX for respectively the time series case and in general the NARX case with exogenous input Iteratively predicting in recurrent mode the next output based on the previous predictions and starting values is done by predict 2 3 Unsupervised learning Function calls kpca denoise_kpca preimage_rbf Demo Subsection Unsupervised learning can be done by kernel ba
34. type gam kernel_par kernel preprocess alpha conftype gt gt ci cilssvm model alpha conftype Description This function calculates bias corrected 100 1 a pointwise or simultaneous confidence intervals The procedure support homoscedastic data sets as well heteroscedastic data sets The construction of the confidence intervals are based on the central limit theorem for linear smoothers combined with bias correction and variance estimation Full syntax e Using the functional interface gt gt ci cilssvm X Y type gam kernel_par kernel preprocess gt gt ci cilssvm X Y type gam kernel_par kernel preprocess alpha gt gt ci cilssvm X Y type gam kernel_par kernel preprocess alpha conftype Outputs ci N x 2 matrix containing the lower and upper confidence intervals Inputs X Training input data used for defining the LS SVM and the preprocessing Y Training output data used for defining the LS SVM and the preprocess ing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original alpha Significance level by default 5 conftype Type of confidence interval pointwise or simultaneous by default si multaneous e Using the object oriented interface gt gt ci cilssvm model gt gt ci c
35. with the RBF kernel Appendix A MATLAB functions A 1 General notation In the full syntax description of the function calls a star indicates that the argument is optional In the description of the arguments a denotes the default value In this extended help of the function calls of LS SVMlab a number of symbols and notations return in the explanation and the examples Variables d empty m N Nt nb X Xt Y Yt Zt These are defined as follows Explanation Dimension of the input vectors Empty matrix Dimension of the output vectors Number of training data Number of test data Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation Nxd matrix with the inputs of the training data Nt xd matrix with the inputs of the test data Nxm matrix with the outputs of the training data Nt xm matrix with the outputs of the test data Nt xm matrix with the predicted latent variables of a classifier This toolbox supports a classical functional interface as well as an object oriented interface The latter has a few dedicated structures which will appear many times Structures bay model Explanation Object oriented representation of the results of the Bayesian inference Object oriented representation of the LS SVM model 41 42 APPENDIX A MATLAB FUNCTIONS A 2 Index of function calls A 2 1 Training and simulation Function Call Short Explanation Reference laten
36. 5 error bars red dotted and red dashed dotted line of the LS SVM estimate solid line of a simple sinc function In the next example the procedure of the automatic relevance determination is illustrated gt gt X normrnd 0 2 100 3 gt gt Y sinc X 1 0 05 X 2 normrnd 0 1 size X 1 1 Automatic relevance determination is used to determine the subset of the most relevant inputs for the proposed model gt gt inputs bay_lssvmARD X Y type 10 3 gt gt alpha b trainlssvm X inputs Y type 10 1 3 3 REGRESSION 29 3 3 4 Using the object oriented model interface This case illustrates how one can use the model interface Here regression is considered but the extension towards classification is analogous gt gt type function approximation gt gt X normrnd 0 2 100 1 gt gt Y sinc X normrnd 0 1 size X 1 1 gt gt kernel RBF_kernel gt gt gam 10 gt gt sig2 0 2 A model is defined gt gt model initlssvm X Y type gam sig2 kerne1 gt gt model model type f x_dim 1 y_dim 1 nb_data 100 kernel_type RBF_kernel preprocess preprocess prestatus ok xtrain 100x1 double ytrain 100x1 double selector 1x100 double gam 10 kernel_pars 0 2000 x_delays 0 y_delays 0 steps 1 latent no code original codetype none pre_xscheme c pre_yscheme c pre_xmean
37. Cost proportional to the posterior of gam gt gt costL2 DcostL2 Deff mu ksi eigval eigvec bay_rr X Y gam 2 With costL2 Cost proportional to the posterior on the second level DcostL2 Derivative of the cost proportional to the posterior Deff Effective number of parameters mu Relative importance of the fitting error term ksi Relative importance of the regularization parameter eigval Eigenvalues of the covariance matrix eigvec Eigenvectors of the covariance matrix e Outputs on the third level The following commands can be used to compute the level 3 cost function for different models e g models with different selected sets of inputs The best model can then be chosen as the model with best level 3 cost CostL3 gt gt costL3 gam_optimal bay_rr X Y gam 3 With costL3 Cost proportional to the posterior on the third inference level gam_optimal Optimal regularization parameter obtained from optimizing the second level 66 APPENDIX A MATLAB FUNCTIONS e Inputs gt gt cost bay_rr X Y gam level X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data gam Regularization parameter level 1 2 3 See also ridgeregress bay_lssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 67 A 3 9 cilssvm Purpose Construction of bias corrected 100 1 a pointwise or simultaneous confidence intervals Basic syntax gt gt ci cilssvm X Y
38. Full syntax We denote the size of positive definite matrix A with a a e Given the full matrix gt gt D eign A nb gt gt V D eign A nb Outputs V axnb matrix with estimated principal eigenvectors of A D nbx1 vector with principal estimated eigenvalues of A Inputs A axa positive definite symmetric matrix nb Number of approximated principal eigenvalues eigenvectors e Given the function to calculate the matrix elements gt gt D eign X kernel kernel_par n gt gt V D eign X kernel kernel_par n Outputs V aXnb matrix with estimated principal eigenvectors of A D nbx1 vector with estimated principal eigenvalues of A Inputs X Nxd matrix with the training data kernel Kernel type e g RBF_kernel kernel_par Kernel parameter s for linear kernel use nb Number of eigenvalues eigenvectors used in the eigenvalue decomposi tion approximation See also eig eigs kpca bay_lssvm 76 APPENDIX A MATLAB FUNCTIONS A 3 15 gcrossvalidate Purpose Estimate the model performance of a model with generalized crossvalidation CAUTION Use this function only to obtain the value of the generalized crossvalidation score function given the tuning parameters Do not use this function together with tunelssvm but use gcrossvalidatelssvm instead The latter is a faster implementation which uses previously computed results Basic syntax gt gt cost gcrossvalidate Xtrain Ytrain type gam si
39. LS SVMlab Toolbox User s Guide version 1 8 K De Brabanter P Karsmakers F Ojeda C Alzate J De Brabanter K Pelckmans B De Moor J Vandewalle J A K Suykens Katholieke Universiteit Leuven Department of Electrical Engineering ESAT SCD SISTA Kasteelpark Arenberg 10 B 3001 Leuven Heverlee Belgium kris debrabanter johan suykens esat kuleuven be http www esat kuleuven be sista Issvmlab ESAT SISTA Technical Report 10 146 August 2011 Acknowledgements Research supported by Research Council KUL GOA AMBioRICS GOA MaNet CoE EF 05 006 Optimization in Engineering OPTEC IOF SCORES4CHEM sev eral PhD post doc amp fellow grants Flemish Government FWO PhD postdoc grants projects G 0452 04 new quantum algorithms G 0499 04 Statistics G 0211 05 Non linear G 0226 06 cooperative systems and optimization G 0321 06 Tensors G 0302 07 SVM Kernel G 0320 08 convex MPC G 0558 08 Robust MHE G 0557 08 Glycemia2 G 0588 09 Brain machine research communities ICCoS ANMMM MLDM G 0377 09 Mechatronics MPC IWT PhD Grants McKnow E Eureka Flite SBO LeCoPro SBO Climaqs POM Belgian Federal Science Policy Office IUAP P6 04 DYSCO Dynamical systems control and optimization 2007 2011 EU ERNSI FP7 HD MPC INFSO ICT 223854 COST intelliCIS EMBOCOM Con tract Research AMINAL Other Helmholtz viCERP ACCM Bauknecht Hoerbiger JS is a professor at K U Leuven Belgium BDM and JWDW
40. PHABETICAL LIST OF FUNCTION CALLS Outputs cost Cost estimation of the generalized cross validation Inputs model Object oriented representation of the LS SVM model estfct Function estimating the cost based on the residuals by default mse See also leaveoneout crossvalidatelssvm trainlssvm simlssvm 77 78 APPENDIX A MATLAB FUNCTIONS A 3 16 initlssvm changelssvm Purpose Only for use with the object oriented model interface Description The Matlab toolbox interface is organized in two equivalent ways In the functional way func tion calls need explicit input and output arguments An advantage is their similarity with the mathematical equations An alternative syntax is based on the concept of a model gathering all the relevant signals parameters and algorithm choices The model is initialized by model initlssvm or will be initiated implicitly by passing the arguments of initlssvm in one cell as the argument of the LS SVM specific functions e g for training gt gt model trainlssvm X Y type gam sig2 gt gt model changelssvm model field value After training the model contains the solution of the training including the used default values All contents of the model can be requested model lt contenttype gt or changed changelssvm each moment The user is advised not to change the fields of the model by model lt field gt lt value gt as the toolbox cannot guarantee consis
41. X EXAMPLES 3 2 6 Multi class coding The following example shows how to use an encoding scheme for multi class problems The encod ing and decoding are considered as a separate and independent preprocessing and postprocessing step respectively Figure 3 5 a and 3 5 b A demo file demomulticlass is included in the toolbox gt gt load multiclass data gt gt Ycode codebook old_codebook code Y code_MOC gt gt gt gt alpha b trainlssvm X Ycode classifier gam sig2 gt gt Yhe simlssvm X Ycode classifier gam sig2 alpha b Xtest gt gt gt gt Yhc code Yh old_codebook codebook codedist_hamming In multiclass classification problems it is easiest to use the object oriented interface which integrates the encoding in the LS SVM training and simulation calls gt gt load multiclass data gt gt model initlssvm X Y classifier RBF_kernel gt gt model tunelssvm model simplex gt leaveoneoutlssvm misclass code_OneVsOne gt gt model trainlssvm model gt gt plotlssvm model1 The last argument of the tunelssvm routine can be set to e code_OneVsOne One versus one coding e code MOC Minimum output coding e code ECOC Error correcting output code e code_OneVsA1l1 One versus all coding 3 3 REGRESSION 25 lt classifier lt Classifier class1 class 1 o class 2 o class 2
42. Y type gan sig2 kernel preprocess See also Trained object oriented representation of the LS SVM model Object oriented representation of the LS SVM model Nxd matrix with the inputs of the training data Nxm vector with the outputs of the training data function estimation f or classifier c Regularization parameter Kernel parameter s for linear kernel use Kernel type by default RBF_kernel preprocess or original simlssvm initlssvm changelssvm plotlssvm prelssvm codelssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 105 A 3 36 tunelssvm linesearch amp gridsearch Purpose Tune the tuning parameters of the model with respect to the given performance measure Basic syntax gam sig2 cost tunelssvm X Y type optfun costfun costargs where the values for tuning parameters fourth and fifth argument are set to the status of empty Using the object oriented interface this becomes model tunelssvm model optfun costfun costargs where model is the object oriented interface of the LS SVM This is created by the command initlssvm model initlssvm X Y type 0 01 Description There are three optimization algorithms simplex which works for all kernels gridsearch is used this one is restricted to 2 dimensional tuning parameter optimization and the third one is linesearch used with the linear kernel The complete tuning process goes
43. a binary classifier are the continuous simulated values of the test or training data which are used to make the final classifications The classification of a test point depends on whether the latent value exceeds the model s threshold b If appropriate the model is trained by the standard procedure trainlssvm first Full syntax e Using the functional interface gt gt Zt latentlssvm X Y classifier gam sig2 kernel alpha b Xt gt gt Zt latentlssvm X Y type gam sig2 kernel preprocess Xt Outputs Zt Nt xm matrix with predicted latent simulated outputs Inputs X Nxd matrix with the inputs of the training data Y Nxm vector with the outputs of the training data type classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original alpha Nx1 matrix with the support values b the bias terms Xt Ntxd matrix with the inputs of the test data e Using the object oriented interface gt gt Zt model latentlssvm model Xt Outputs Zt Nt Xm matrix with continuous latent simulated outputs model Trained object oriented representation of the LS SVM model Inputs model Object oriented representation of the LS SVM model Xt Ntxd matrix with the inputs of the test data See also trainlssvm simlssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 85 A 3 21 leaveoneout Purpose
44. ameter is rescaled appropriately after removing an input variable The computation of the Bayesian cost criterion can be based on the singular value decompo sition svd of the full kernel matrix or by an approximation of these eigenvalues and vectors by the eigs or eign approximation based on nb data points Full syntax e Using the functional interface gt gt dimensions ordered costs sig2s bay_lssvmARD X Y type gam sig2 kernel preprocess method etype nb Outputs dimensions rx1 vector of the relevant inputs ordered dx1 vector with inputs in decreasing order of relevance costs Costs associated with third level of inference in every selection step sig2s Optimal kernel parameters in each selection step Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original method gt discrete or continuous etype svd eig eigs eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation 60 APPENDIX A MATLAB FUNCTIONS e Using the object oriented interface gt gt dimensions ordered costs sig2s model bay_lss
45. are full professors at K U Leuven Belgium Preface to LS SVMLab v1 8 LS SVMLab v1 8 contains some bug fixes from the previous version e When using the preprocessing option class labels are not considered as real variables This problem occurred when the number of dimensions were larger than the number of data points e The error matrix is not positive definite in the crossvalidatelssvm command has been solved e The error in the robustlssvm command with functional interface has been solved robustlssvm now only works with the object oriented interface This is also adapted in the manual at pages 33 and e The error Reference to non existent field implementation has been solved in the bay_optimize command The LS SVMLab Team Heverlee Belgium June 2011 Preface to LS SVMLab v1 7 We have added new functions to the toolbox and updated some of the existing commands with respect to the previous version v1 6 Because many readers are familiar with the layout of version 1 5 and version 1 6 we have tried to change it as little as possible Here is a summary of the main changes e The major difference with the previous version is the optimization routine used to find the minimum of the cross validation score function The tuning procedure consists out of two steps 1 Coupled Simulated Annealing determines suitable tuning parameters and 2 a simplex method uses these previous values as starting values in order to perform a
46. ariables Zt and the corresponding class labels Yclass gt gt Zt 7 Yclass 1 3 1 1 5 1 2 1 gt gt roc Zt Yclass For use in LS SVMlab a shorthand notation allows making the ROC curve on the training data Implicit training and simulation of the latent values simplifies the call gt gt roc X Y classifier gam sig2 kernel gt gt roc model Full syntax e Standard call LS SVMlab independent gt gt area se thresholds oneMinusSpec sens TN TP FN FP gt gt area se thresholds oneMinusSpec sens TN TP FN FP roc Zt Y roc Zt Y figure Outputs area Area under the ROC curve se Standard deviation of the residuals thresholds Nx1 different thresholds value oneMinusSpec 1 Specificity of each threshold value sens Sensitivity for each threshold value TN Number of true negative predictions TP Number of true positive predictions FN Number of false negative predictions FP Number of false positive predictions Inputs Zt Nx1 latent values of the predicted outputs Y Nx1 of true class labels figure figure or nofigure A 3 ALPHABETICAL LIST OF FUNCTION CALLS 101 e Using the functional interface for the LS SVMs gt gt area se thresholds oneMinusSpec sens TN TP FN FP roc X Y classifier gam sig2 kernel gt gt area se thresholds oneMinusSpec sens TN TP FN FP Outputs area se thr
47. ass labels codelssvm Encoding the LS SVM model deltablssvm Bias term correction for the LS SVM classifi catier roc Receiver Operating Characteristic curve of a bi nary classifier 48 APPENDIX A MATLAB FUNCTIONS A 2 7 Bayesian framework Function Call Short Explanation Reference bay_errorbar Compute the error bars for a one dimensional regression problem bay_initlssvm Initialize the tuning parameters for Bayesian in ference bay_lssvm Compute the posterior cost for the different lev els in Bayesian inference bay_lssvmARD Automatic Relevance Determination of the in puts of the LS SVM bay_modoutClass Estimate the posterior class probabilities of a binary classifier using Bayesian inference bay_optimize Optimize model or tuning parameters with re spect to the different inference levels bay_rr Bayesian inference for linear ridge regression eign Find the principal eigenvalues and eigenvectors of a matrix with Nystr m s low rank approxi mation method kernel_matrix Construct the positive semi definite kernel matrix kpca Kernel Principal Component Analysis ridgeregress Linear ridge regression P ja FF FF Fe im a Joo D o A w KO 0 N gt w iw hea A 2 INDEX OF FUNCTION CALLS 49 A 2 8 NARX models and prediction Function Call Short Explanation Reference predict Iterative prediction of a trained LS SVM NARX model in recurrent mode windowize Rearrange the data points into a Hankel matrix for N AR t
48. ation parameter Kernel parameter s for linear kernel use Kernel type by default RBF_kernel preprocess preprocess or original xt prior etype nb Ntxd matrix with the inputs of the test data Prior knowledge of the balancing of the training data or gt svd eig eigs or eign Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation e Using the object oriented interface gt gt gt gt gt gt gt gt Ppos Pneg bay model bay_modoutClass model Xt Ppos Pneg bay model bay_modoutClass model Xt prior Ppos Pneg bay model Ppos Pneg bay model bay_modoutClass model Xt prior etype bay_modoutClass model Xt prior etype nb gt gt bay_modoutClass model figure gt gt bay_modoutClass model figure prior gt gt bay_modoutClass model figure prior etype gt gt bay_modoutClass model figure prior etype nb Outputs Ppos Pneg bay model Inputs See also model Xt prior etype nb Ntx1 vector with probabilities that testdata Xt belong to the positive class Ntx1 vector with probabilities that testdata Xt belong to the nega tive zero class Object oriented representation of the results of the Bayesian inference Object oriented representation of the LS SVM model Object oriented representation of the LS SVM model
49. b gam sig2 model1 1lssvm X Y type kernel Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c kernel Kernel type by default RBF_kernel Outputs Yp N x m matrix with output of the training data alpha N xm matrix with support values of the LS SVM b 1 x m vector with bias term s of the LS SVM gam Regularization parameter determined by cross validation sig2 Squared bandwidth determined by cross validation for linear kernel sig2 0 model Trained object oriented representation of the LS SVM model See also trainlssvm simlssvm crossvalidate leaveoneout plotlssvm 90 A 3 25 plotlssvm Purpose APPENDIX A MATLAB FUNCTIONS Plot the LS SVM results in the environment of the training data Basic syntax gt gt plotlssvm X Y type gam sig2 kernel gt gt plotlssvm X Y type gam sig2 kernel alpha b gt gt model plotlssvm model Description The first argument specifies the LS SVM The latter specifies the results of the training if already known Otherwise the training algorithm is first called One can specify the precision of the plot by specifying the grain of the grid By default this value is 50 The dimensions seldims of the input data to display can be selected as an optional argument in case of higher dimensional inputs gt 2 A grid will be
50. c Regression Springer Hall P 1992 On Bootstrap Confidence Intervals in Nonparametric Regression Annals of Statistics 20 2 695 711 Hanley J A McNeil B J 1982 The meaning and use of the area under a receiver operating characteristic ROC curve Radiology 1982 143 29 36 Huber P J 1964 Robust estimation of a location parameter Ann Math Statist 35 73 101 Loader C 1999 Local Regression and Likelihood Springer Verlag MacKay D J C 1992 Bayesian interpolation Neural Computation 4 3 415 447 Mika S Schdlkopf B Smola A M ller K R Scholz M Ratsch G 1999 Kernel PCA and de noising in feature spaces Advances in Neural Information Processing Systems 11 536 542 MIT Press Mika S Ratsch G Weston J Sch lkopf B M ller K R 1999 Fisher discriminant analysis with kernels In Neural Networks for Signal Processing IX 41 48 IEEE Nabney I T 2002 Netlab Algorithms for Pattern Recognition Springer Nelder J A and Mead R 1965 A simplex method for function minimization Computer Journal 7 308 313 Poggio T Girosi F 1990 Networks for approximation and learning Proc of the IEEE 78 1481 1497 Rice S O 1939 The distribution of the maxima of a random curve American Journal of Mathematics 61 2 409 416 Ruppert D Wand M P and Carroll R J 2003 Semiparametric Regression Cambridge University Press
51. class3 class 3 Classifier Classifier class 1 class 1 o class 2 o class 2 class 3 class 3 Figure 3 5 LS SVM multi class example a one versus one encoding b error correcting output code c Minimum output code d One versus all encoding 3 3 Regression 3 3 1 A simple example This is a simple demo solving a simple regression task using LS SVMlab A dataset is constructed in the correct formatting The data are represented as matrices where each row contains one datapoint gt gt X linspace 1 1 50 gt gt Y 15 X 72 1 72 X 4 exp X normrnd 0 0 1 length X 1 gt gt X X 1 0000 0 9592 0 9184 0 8776 26 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES 0 8367 0 7959 0 9592 1 0000 gt gt Y 0 0138 0 2953 0 6847 1 1572 1 5844 1 9935 0 0613 0 0298 In order to obtain an LS SVM model with the RBF kernel we need two extra tuning pa rameters y gam is the regularization parameter determining the trade off between the training error minimization and smoothness of the estimated function sig2 is the kernel function parameter In this case we use leave one out CV to determine the tuning parameters gt gt type function estimation gt gt gam sig2 tunelssvm X Y type RBF_kernel simplex gt leaveoneoutlssvm mse gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel gt gt plotlss
52. codebook codedist_fct gt gt Yd code Yc codebook old_codebook codedist_fct codedist_args Outputs Yd Nxnc decoded output classifier Inputs Y Nxd matrix representing the original classifier codebook d nc matrix representing the original encoding old_codebook bits nc matrix representing the encoding of the given classifier codedist_fct Function to calculate the distance between to encoded classifiers e g codedist_hamming codedist_args Extra arguments of codedist_fct See also code_ECOC code_MOC code_OneVsAll code_OneVsOne codedist_hamming A 3 ALPHABETICAL LIST OF FUNCTION CALLS 71 A 3 11 crossvalidate Purpose Estimate the model performance of a model with l fold crossvalidation CAUTION Use this function only to obtain the value of the crossvalidation score function given the tuning parameters Do not use this function together with tunelssvm but use crossvalidatelssvm instead The latter is a faster implementation which uses previously computed results Basic syntax gt gt cost crossvalidate Xtrain Ytrain type gam sig2 gt gt cost crossvalidate model Description The data is once permutated randomly then it is divided into L by default 10 disjoint sets In the i th i 1 1 iteration the i th set is used to estimate the performance validation set of the model trained on the other l 1 sets training set Finally the 1 denoted by L different estimates of the performa
53. data Yw Nx1 matrix with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original by default Xt nbx1 matrix of the starting points for the prediction nb Number of outputs to predict e Using the object oriented interface with LS SVMs gt gt Yp predict model Xt gt gt Yp predict model Xt nb Outputs Yp nbx1 matrix with the predictions Inputs model Object oriented representation of the LS SVM model Xt nbx1 matrix of the starting points for the prediction nb Number of outputs to predict 92 APPENDIX A MATLAB FUNCTIONS e Using another model gt gt Yp predict model Xt nb simfct arguments Outputs Yp nbx1 matrix with the predictions Inputs model Object oriented representation of the LS SVM model Xt nbx1 matrix of the starting points for the prediction nb Number of outputs to predict simfct Function used to evaluate a test point arguments Cell with the extra arguments passed to simfct See also windowize trainlssvm simlssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 93 A 3 27 predlssvm Purpose Construction of bias corrected 100 1 a pointwise or simultaneous prediction intervals Description gt gt pi predlssvm X Y type gam sig2 kernel preprocess Xt
54. ded to be used with iteratively reweighted LS SVM 4 LS SVM solver All CMEX and or C files have been removed The linear system is solved by using the Matlab com mand backslash The LS SVMLab Team Heverlee Belgium June 2010 Contents 1 Introductio Sipe E Re ss hgh Bye te a Ae 2 4 Solving large scale problems with fixed size LS SV 3 LS SVMlab toolbox example 3 1 Roadmap to LS SVM 0 0 0 0 0 0000 3 2 Classification 2 a 3 2 1 Hello world Oi Bh tek ed NEE ERE NE 3 2 4 LS SVM classification only one command line away A simple exampld 0 3 3 2 _LS SVM regression only one command line away ee eee bases Ake Peewee rr rr rs purer s oe Eee L E ee EEE EE E Saha en es Dee hp eee bike ee ed Reavers oad ed ee da de ats iy Dee eee eer re 10 CONTENTS A 2 8 NARX models and predictio A 3 Alphabetical list of function calls A 3 20 latentlssv A 3 21 leaveoneout o oa aa a 85 A 3 36 tunelssvm linesearch amp gridsearc A 3 37 windowize amp windowizeNAR Chapter 1 Introduction Support Vector Machines SVM is a powerful methodology for solving problems in nonlinear classification function estimation and density estimation which has also led to many other recent developments in kernel based learning methods in general I4 5 27 28 48 47 SVMs have been
55. del figure figure or nofigure See also deltablssvm trainlssvm 102 APPENDIX A MATLAB FUNCTIONS A 3 34 simlssvm Purpose Evaluate the LS SVM at given points Basic syntax gt gt Yt simlssvm X Y type gam sig2 kernel alpha b Xt gt gt Yt simlssvm X Y type gam sig2 kernel Xt gt gt Yt simlssvm model Xt Description The matrix Xt represents the points one wants to predict The first cell contains all arguments needed for defining the LS SVM see also trainlssvm initlssvm The second cell contains the results of training this LS SVM model The cell syntax allows for flexible and consistent default handling Full syntax e Using the functional interface gt gt Yt Zt simlssvm X Y type gam sig2 Xt gt gt Yt Zt simlssvm X Y type gam sig2 kernel Xt gt gt Yt Zt simlssvm X Y type gam sig2 kernel preprocess Xt gt gt Yt Zt simlssvm X Y type gam sig2 kernel alpha b Xt Outputs Yt Nt xm matrix with predicted output of test data Zt Nt xm matrix with predicted latent variables of a classifier Inputs X Nxd matrix with the inputs of the training data Y Nxm vector with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kerne1 Kernel type by default RBF_kernel preprocess preprocess or
56. e kernel type kernel parameters and regularization parameter this is straightforward If not so one can specify the different types and or parameters as a row vector in the appropriate argument Each dimension will be trained with the corresponding column in this vector gt gt alpha b trainlssvm X Y_1 Y_d type gam_1 gam_d sig2_1 sig2_d kernel_1 kernel_d 104 Full syntax APPENDIX A MATLAB FUNCTIONS e Using the functional interface gt gt alpha b trainlssvm X Y type gam sig2 gt gt alpha b gt gt alpha b Outputs alpha b Inputs X Y type gam sig2 kernel preprocess trainlssvm X Y type gam sig2 kernel trainlssvm X Y type gam sig2 kernel preprocess Nxm matrix with support values of the LS SVM 1xm vector with bias term s of the LS SVM Nxd matrix with the inputs of the training data Nxm vector with the outputs of the training data function estimation f or classifier c Regularization parameter Kernel parameter s for linear kernel use Kernel type by default RBF_kernel preprocess or original e Using the object oriented interface gt gt model trainlssvm model gt gt model trainlssvm X Y type gam sig2 gt gt model trainlssvm X Y type gam sig2 kernel gt gt model trainlssvm X Y type gam sig2 kernel preprocess Outputs model Inputs model x
57. e_ECOC Additional arguments to this function can be passed as a cell in codefct_args gt gt Yc code Y codefct codefct_args gt gt Yc code Y codefct codefct_args old_codebook gt gt Yc codebook oldcodebook code Y codefct codefct_args To detect the classes of a disturbed encoded signal given the corresponding codebook one needs a distance function fctdist with optional arguments given as a cell fctdist_args By default the Hamming distance of function codedist_hamming is used gt gt Yc code Y codefct codefct_args old_codebook fctdist fctdist_args A simple example is given here a more elaborated example is given in section 3 2 6 Here a short categorical signal Y is encoded in Yec using Minimum Output Coding and decoded again to its original form gt gt Y 1 2 3 2 1 gt gt Yc codebook old_codebook code Y code_MOC encode gt gt Yc 1 1 1 1 1 1 1 1 1 1 gt gt codebook 1 1 1 1 1 1 gt gt old_codebook 1 2 3 A 3 ALPHABETICAL LIST OF FUNCTION CALLS 69 gt gt code Yc old_codebook codebook codedist_hamming decode ans 1 2 3 2 1 Different encoding schemes are available e Minimum Output Coding code_M0C Here the minimal number of bits nz is used to encode the ne classes np logs ne e Error Correcting Output Code code_ECOC This coding scheme uses redundant bits Typically one bounds the number of b
58. ear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original estfct Function estimating the cost based on the residuals by default mse combinefct Function combining the estimated costs on the different folds by default mean e Using the object oriented interface for the LS SVMs gt gt cost leaveoneout model gt gt cost leaveoneout model estfct gt gt cost leaveoneout model estfct combinefct 86 APPENDIX A MATLAB FUNCTIONS Outputs cost Cost estimated by leave one out crossvalidation Inputs model Object oriented representation of the model estfct Function estimating the cost based on the residuals by default mse combinefct Function combining the estimated costs on the different folds by default mean See also crossvalidate trainlssvm simlssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 87 A 3 22 lin_kernel poly_kernel RBF_kernel Purpose Kernel implementations used with the Matlab training and simulation procedure Description lin_kernel Linear kernel K zi xj PAEA poly_kernel Polynomial kernel K x j 7 z t t20 with t the intercept and d the degree of the polynomial RBF kernel Radial Basis Function kernel 3 _llzi 7jll K zi xj e z with o the variance of the Gaussian kernel Full syntax gt gt v RBF_kernel x1 X2 sig2 Outputs v Nx1 vector with kernel values Calls RBF_ker
59. ed representation of the LS SVM model with initial tuning parameters Inputs model Object oriented representation of the LS SVM model See also bay_lssvm bay_optimize A 3 ALPHABETICAL LIST OF FUNCTION CALLS 57 A 3 4 bay_lssvm Purpose Compute the posterior cost for the 3 levels in Bayesian inference Basic syntax gt gt cost bay_lssvm X Y type gam sig2 level etype gt gt cost bay_lssvm model level etype Description Estimate the posterior probabilities of model tuning parameters on the different inference levels By taking the negative logarithm of the posterior and neglecting all constants one obtains the corresponding cost Computation is only feasible for one dimensional output regression and binary classification problems Each level has its different input and output syntax e First level The cost associated with the posterior of the model parameters support values and bias term is determined The type can be train do a training of the support values using trainlssvm The total cost the cost of the residuals Ed and the regularization parameter Ew are determined by the solution of the support values retrain do a retraining of the support values using trainlssvm the cost terms can also be calculated from an approximate eigenvalue decomposition of the kernel matrix svd eig eigs or Nystrom s eign e Second level The cost associated with the p
60. ence e Outputs on the second level gt gt model gam bay_optimize X Y type gam sig2 kernel preprocess 2 gt gt model gam bay_optimize model 2 With model Object oriented representation of the LS SVM model optimized on the second level of inference gam Regularization parameter optimized on the second level of inference e Outputs on the third level gt gt model sig2 gt gt model sig2 bay_optimize X Y type gam sig2 kernel preprocess 3 bay_optimize model 3 With model Object oriented representation of the LS SVM model optimized on the third level of inference sig2 Kernel parameter optimized on the third level of inference e Inputs using the functional interface 64 APPENDIX A MATLAB FUNCTIONS gt gt model bay_optimize X Y type gam sig2 kernel preprocess level gt gt model bay_optimize X Y type gam sig2 kernel preprocess level etype gt gt model bay_optimize X Y type gam sig2 kernel preprocess level etype nb X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kerne1 Kernel type by default RBF_kernel preprocess preprocess or original level 1 2 3 etype eig svd eigs eign nb Number of eigenvalues eigenvector
61. ensional input space one can make a surface plot by replacing Xt by the string figure Full syntax e Using the functional interface gt gt Ppos Pneg bay_modoutClass X Y classifier gam sig2 kernel preprocess Xt gt gt Ppos Pneg bay_modoutClass X Y classifier gam sig2 kernel preprocess Xt prior gt gt Ppos Pneg bay_modoutClass X Y classifier gam sig2 kernel preprocess Xt prior etype gt gt Ppos Pneg bay_modoutClass X Y classifier gam sig2 kernel preprocess Xt prior etype nb gt gt bay_modoutClass X Y classifier gam sig2 kernel preprocess figure gt gt bay_modoutClass X Y classifier gam sig2 kernel preprocess figure prior gt gt bay_modoutClass X Y classifier gam sig2 kernel preprocess figure prior etype gt gt bay_modoutClass X Y classifier gam sig2 kernel preprocess figure prior etype nb 62 Outputs Ppos Pneg Inputs X Y type gam sig2 kernel APPENDIX A MATLAB FUNCTIONS Ntx1 vector with probabilities that testdata Xt belong to the positive class Ntx1 vector with probabilities that testdata Xt belong to the negative zero class Nxd matrix with the inputs of the training data Nx1 vector with the outputs of the training data function estimation f or classifier c Regulariz
62. es U lam gt gt features AFEm X kernel sig2 Xt AFEm X kernel sig2 Xt etype AFEm X kernel sig2 Xt etype nb AFEm X kernel sig2 Xt 1 0 U lam Outputs features Ntxnb matrix with extracted features u Nxnb matrix with eigenvectors lam nbx1 vector with eigenvalues Inputs X Nxd matrix with input data kernel Name of the used kernel e g RBF_kernel sig2 Kernel parameter s for linear kernel use Xt Ntxd data from which the features are extracted etype eig eigs or eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation u Nxnb matrix with eigenvectors lam nbx1 vector with eigenvalues See also kernel_matrix RBF_kernel demo_fixedsize 54 APPENDIX A MATLAB FUNCTIONS A 3 2 bay_ errorbar Purpose Compute the error bars for a one dimensional regression problem Basic syntax gt gt sig_e bay_errorbar X Y function gam sig2 Xt gt gt sig_e bay_errorbar model Xt Description The computation takes into account the estimated noise variance and the uncertainty of the model parameters estimated by Bayesian inference sig_e is the estimated standard deviation of the error bars of the points Xt A plot is obtained by replacing Xt by the string figure Full syntax e Using the functional interface gt gt sig_e bay_errorbar X Y function gam sig2 kernel preprocess gt g
63. es i j mean performances end end The kernel parameter and capacity corresponding to a good performance are searched gt gt minp ic min minimal_performances 1 gt gt minminp is min minp gt gt capacity caps ic gt gt sig2 sig2s is The following approach optimizes the selection of support vectors according to the quadratic Renyi entropy gt gt load data X and Y capacity and the kernel parameter sig2 gt gt sv 1 capacity gt gt max_c inf gt gt for i 1 size X 1 replace ceil rand capacity subset sv 1 replace 1 replacet i end i crit kentropy X subset RBF_kernel sig2 if max_c lt crit max_c crit sv subset end end This selected subset of support vectors is used to construct the final model Figure B 15h gt gt features AFEm svX RBF_kernel sig2 X gt gt C13 gam_optimal bay_rr features Y 1 3 gt gt W b Yh ridgeregress features Y gam_opt Fixed Size LS SVM on 20 000 noisy sinc data points Estimated cost surface of fixed size LS SVM on repeated i i d subsampling training data 1 27 s x support vectors real function estimated function 40 1200 1000 Figure 3 15 Illustration of fixed size LS SVM on a noisy sinc function with 20 000 data points a fixed size LS SVM selects a subset of the data after Nystrom approximation The regularization
64. esholds oneMinusSpec type gam sig2 kernel preprocess figure roc X Y classifier gam sig2 kernel figure Area under the ROC curve Standard deviation of the residuals Different thresholds 1 Specificity of each threshold value Sensitivity for each threshold value Number of true negative predictions Number of true positive predictions Number of false negative predictions Number of false positive predictions Nxd matrix with the inputs of the training data Nx1 vector with the outputs of the training data gt classifier c Regularization parameter Kernel parameter s for linear kernel use Kernel type by default RBF_kernel gt preprocess or original figure or nofigure e Using the object oriented interface for the LS SVMs gt gt area se thresholds oneMinusSpec sens TN TP FN FP gt gt area se thresholds oneMinusSpec sens TN TP FN FP Outputs area Area under the ROC curve se Standard deviation of the residuals thresholds Nx1 vector with different thresholds oneMinusSpec 1 Specificity of each threshold value sens Sensitivity for each threshold value roc model roc model figure TN Number of true negative predictions TP Number of true positive predictions FN Number of false negative predictions FP Number of false positive predictions Inputs model Object oriented representation of the LS SVM mo
65. estfct combinefct Outputs cost Cost estimation of the robust L fold cross validation costs Lx1 vector with costs estimated on the L different folds ec Nx1 vector with residuals of all data Inputs model Object oriented representation of the LS SVM model L Number of folds by default 10 wfun weighting scheme by default whuber estfct Function estimating the cost based on the residuals by default mse combinefct Function combining the estimated costs on the different folds by default mean See also mae weightingscheme crossvalidate trainlssvm robustlssvm 98 APPENDIX A MATLAB FUNCTIONS A 3 31 ridgeregress Purpose Linear ridge regression Basic syntax gt gt Cw b ridgeregress X Y gam gt gt w b Yt ridgeregress X Y gam Xt Description Ordinary least squares on training errors together with minimization of a regularization parameter gam Full syntax gt gt w b ridgeregress X Y gam gt gt w b Yt ridgeregress X Y gam Xt Outputs W dx1 vector with the regression coefficients b bias term Yt Ntx1 vector with predicted outputs of test data Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data gam Regularization parameter Xt Ntxd matrix with the inputs of the test data See also bay_rr bay_lssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 99 A 3 32 robustlssvm Purpose Robust training in the
66. f the test data to preprocess Yt Ntxd matrix with the outputs of the test data to preprocess e Postprocessing gt gt model postlssvm model gt gt Xt postlssvm model Xp gt gt empty Yt postlssvm model Yp gt gt Xt Yt postlssvm model Xp Yp Outputs model Postprocessed object oriented representation of the LS SVM model Xt Ntxd matrix with the postprocessed inputs of the test data Yt Ntxd matrix with the postprocessed outputs of the test data Inputs model Object oriented representation of the LS SVM model Xp Nt xd matrix with the inputs of the test data to postprocess Yp Nt xd matrix with the outputs of the test data to postprocess 96 APPENDIX A MATLAB FUNCTIONS A 3 30 rcrossvalidate Purpose Estimate the model performance with robust L fold crossvalidation only regression CAUTION Use this function only to obtain the value of the robust L fold crossvalidation score function given the tuning parameters Do not use this function together with tunelssum but use rcerossvalidatelssum instead Basic syntax gt gt cost rcrossvalidate model gt gt cost rcrossvalidate X Y function gam sig2 Description Robustness in the fold crossvalidation score function is obtained by iteratively reweighting schemes This routine is ONLY valid for regression Full syntax e Using LS SVMlab with the functional interface gt gt cost costs rcrossvalidate X Y type gam sig2 kernel preprocess
67. fct sig2 Xt gt gt eigval eigvec scores omega recErrors optOut etype nb rescaling kpca X kernel_fct sig2 Xt A 3 ALPHABETICAL LIST OF FUNCTION CALLS 83 Outputs eigval N nb x1 vector with eigenvalues values eigvec NXN Nxnb matrix with the principal directions scores Nt xnb matrix of the scores of test data or omega NXN centered kernel matrix recErrors Ntx1 vector with the reconstruction errors of test data optOut 1x2 cell array with the centered test kernel matrix in optOut 1 and the squared norms of the test points in the feature space in optOut 2 Inputs X Nxd matrix with the inputs of the training data kernel Kernel type e g RBF_kernel sig2 Kernel parameter s for linear kernel use Xt Ntxd matrix with the inputs of the test data or etype svd eig eigs eign nb Number of eigenvalues eigenvectors used in the eigenvalue decomposi tion approximation rescaling original size o or rescaling Er See also bay_lssvm bay_optimize eign 84 APPENDIX A MATLAB FUNCTIONS A 3 20 latentlssvm Purpose Calculate the latent variables of the LS SVM classifier at the given test data Basic syntax gt gt Zt latentlssvm X Y classifier gam sig2 kernel alpha b Xt gt gt Zt latentlssvm X Y classifier gam sig2 kernel Xt gt gt Zt model latentlssvm model Xt Description The latent variables of
68. fine tuning of the parameters The major advantage is speed The number of function evaluations needed to find optimal parameters reduces from 200 in v1 6 to 50 in this version e The construction of bias corrected approximate 100 1 a pointwise simulataneous con fidence and prediction intervals have been added to this version e Some bug fixes are performed in the function roc The class labels do not need to be 1 or 1 but can also be 0 and 1 The conversion is automatically done The LS SVMLab Team Heverlee Belgium September 2010 Preface to LS SVMLab v1 6 We have added new functions to the toolbox and updated some of the existing commands with respect to the previous version v1 5 Because many readers are familiar with the layout of version 1 5 we have tried to change it as little as possible The major difference is the speed up of several methods Here is a summary of the main changes Chapter solver function What s new 1 A birds eye on LS SVMLab 2 LS SVMLab toolbox examples Roadmap to LS SVM Addition of more regres sion and classification examples Easier interface for multi class classification Changed implementation for robust LS SVM 3 Matlab functions Possibility of regression or classification using only one command The function validate has been deleted Faster robust training and robust model selection criteria are provided In case of robust re gression different weight functions are provi
69. fy a prior class probability in the moderated output in order to compensate for an unbalanced number of training data points in the two classes When the training set contains N positive instances and NT negative ones the moderated output is calculated as Nt Ne gt gt Np 10 gt gt Nn 50 gt gt prior Np Nn Np gt gt Posterior_class_P bay_modoutClass X Y type 10 1 RBF_kernel figure prior The results are shown in Figure B 4 3 2 CLASSIFICATION 23 Probability of occurence of class 1 class 1 o class 2 1 2 1 0 8 0 6 0 4 0 2 0 0 2 0 4 0 6 0 8 X 1 a Moderated Output Probability of occurence of class 1 Probability of occurence of class 1 class1 o class2 class1 o class2 0 8 0 6 0 2 fi f 1 2 1 0 8 0 6 0 4 0 2 0 0 2 0 4 0 6 0 8 1 2 1 0 8 0 6 0 4 0 2 0 0 2 0 4 0 6 0 8 x x b Unbalanced subset c With correction for unbalancing Figure 3 4 a Moderated output of the LS SVM classifier on the Ripley data set The colors indicate the probability to belong to a certain class b This example shows the moderated output of an unbalanced subset of the Ripley data c One can compensate for unbalanced data in the calculation of the moderated output Notice that the area of the blue zone with the positive samples increases by the compensation The red zone shrinks accordingly 24 CHAPTER 3 LS SVMLAB TOOLBO
70. g2 gt gt cost gcrossvalidate model Description Instead of dividing the data into L disjoint sets one takes the complete data and the effective degrees of freedom effective number of parameters into account The assumption is made that the input data are distributed independent and identically over the input space gt gt cost gcrossvalidate model Some commonly used criteria are gt gt cost gcrossvalidate model misclass gt gt cost gcrossvalidate model mse gt gt cost gcrossvalidate model mae Full syntax e Using LS SVMlab with the functional interface gt gt cost gcrossvalidate X Y type gam sig2 kernel preprocess gt gt cost gcrossvalidate X Y type gam sig2 kernel preprocess estfct Outputs cost Cost estimation of the generalized cross validation Inputs X Training input data used for defining the LS SVM and the preprocessing Y Training output data used for defining the LS SVM and the preprocess ing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kerne1 Kernel type by default RBF_kernel preprocess preprocess or original estfct Function estimating the cost based on the residuals by default mse e Using the object oriented interface gt gt cost gcrossvalidate model gt gt cost gcrossvalidate model estfct A 3 AL
71. idation is based upon feedforward simulation on the validation set using the feedfor wardly trained model gt gt gam sig2 tunelssvm Xtra Ytra f RBF_kernel simplex gt crossvalidatelssvm 10 mae Prediction of the next 100 points is done in a recurrent way gt gt alpha b trainlssvm Xtra Ytra f gam sig2 RBF_kernel gt gt predict next 100 points gt gt prediction predict Xtra Ytra f gam sig2 RBF_kernel Xs 100 gt gt plot prediction Xt In Figure 3 13 results are shown for the Santa Fe laser data 300 Iterative prediction Santa Fe laser data 2507 4 200 F 7 1507 7 Amplitude 100 H 7 20 40 60 80 100 Discrete time index Figure 3 13 The solid line denotes the Santa Fe chaotic laser data The dashed line shows the iterative prediction using LS SVM with the RBF kernel with optimal hyper parameters obtained by tuning 3 3 REGRESSION 37 3 3 9 Fixed size LS SVM The fixed size LS SVM is based on two ideas see also Section 2 4 the first is to exploit the primal dual formulations of the LS SVM in view of a Nystr m approximation Figure B 14 Wg Figure 3 14 Fixed Size LS SVM is a method for solving large scale regression and classification problems The number of support vectors is pre fixed beforehand and the support vectors are selected from a po
72. ilssvm model alpha gt gt ci cilssvm model alpha conftype Outputs ci N x 2 matrix containing the lower and upper confidence intervals Inputs model Object oriented representation of the LS SVM model alpha Significance level by default 5 conftype Type of confidence interval pointwise or simultaneous by default simultaneous See also trainlssvm simlssvm predlssvm 68 APPENDIX A MATLAB FUNCTIONS A 3 10 code codelssvm Purpose Encode and decode a multi class classification task into multiple binary classifiers Basic syntax gt gt Yc code Y codebook Description The coding is defined by the codebook The codebook is represented by a matrix where the columns represent all different classes and the rows indicate the result of the binary classifiers An example is given the 3 classes with original labels 1 2 3 can be encoded in the following codebook using Minimum Output Coding gt gt codebook 1 1 1 1 1 1 For this codebook a member of the first class is found if the first binary classifier is negative and the second classifier is positive A don t care is represented by NaN By default it is assumed that the original classes are represented as different numerical labels One can overrule this by passing the old_codebook which contains information about the old representation A codebook can be created by one of the functions codefct code_MOC code_OneVsOne code_OneVsAll cod
73. ime series modeling windowize_NARX Rearrange the input and output data into a block Hankel matrix for N AR X time series modeling 50 APPENDIX A MATLAB FUNCTIONS A 2 9 Unsupervised learning Function Call Short Explanation Reference AFEm Automatic Feature Extraction from Nystrom method denoise_kpca Reconstruct the data mapped on the principal components kentropy Quadratic Renyi Entropy for a kernel based es timator kpca Compute the nonlinear kernel principal compo nents of the data preimage_rbf Compute an approximate pre image in the in put space for RBF kernels P Ne to A 2 INDEX OF FUNCTION CALLS 51 A 2 10 Fixed size LS SVM The idea of fixed size LS SVM is still under development However in order to enable the user to explore this technique a number of related functions are included in the toolbox A demo illustrates how to combine these in order to build a fixed size LS SVM Function Call Short Explanation Reference AFEm Automatic Feature Extraction from Nystr m A31 method bay_rr Bayesian inference of the cost on the 3 levels of linear ridge regression demo_fixedsize Demo illustrating the use of fixed size LS SVMs for regression demo_fixedclass Demo illustrating the use of fixed size LS SVMs for classification kentropy Quadratic Renyi Entropy for a kernel based es timator ridgeregress Linear ridge regression 52 APPENDIX A MATLAB FUNCTIONS A 2 11 Demos name of the demo Short Explanat
74. inary clas sifiers np by np lt 15 log ne However it is not guaranteed to have a valid np representation of ne classes for all combi nations This routine based on backtracking can take some memory and time e One versus All Coding code_OneVsA11 Each binary classifier k 1 Nne is trained to discriminate between class k and the union of the others e One Versus One Coding code_OneVsOns Each of the n binary classifiers is used to discriminate between a specific pair of ne classes Ne Ne 1 2 Np Different decoding schemes are implemented e Hamming Distance codedist_hamming This measure equals the number of corresponding bits in the binary result and the codeword Typically it is used for the Error Correcting Code e Bayesian Distance Measure codedist_bay The Bayesian moderated output of the binary classifiers is used to estimate the posterior probability Encoding using the previous algorithms of the LS SVM multi class classifier can easily be done by codelssvm It will be invoked by trainlssvm if an appropriate encoding scheme is defined in a model An example shows how to use the Bayesian distance measure to extract the estimated class from the simulated encoded signal Assumed are input and output data X and Y size is respectively Nirain X Din and Nirain X 1 a kernel parameter sig2 and a regularization parameter gam Yt corresponding to a set of data points Xt size is Ntest X Din is to be estimated
75. ing 15 outliers is constructed gt gt X 5 07 5 gt gt epsilon 0 15 gt gt sel rand length X 1 gt epsilon gt gt Y sinc X sel normrnd 0 1 length X 1 1 sel normrnd 0 2 length X 1 Robust tuning of the tuning parameters is performed by rcrossvalildatelssvm Also notice that the preferred loss function is the L mae The weighting function in the cost function is chosen to be the Huber weights Other possibilities included in the toolbox are logistic weights myriad weights and Hampel weights Note that the function robustlssvm only works with the object oriented interface gt gt model initlssvm X Y f 0 RBF_kernel gt gt L_fold 10 10 fold CV gt gt model tunelssvm model simplex gt rcrossvalidatelssvm L_fold mae whuber Robust training is performed by robustlssvm gt gt model robustlssvm model1 gt gt plotlssvm model1 RBF RBF R y 95025 4538 0 0 66686 l 0 14185 07 0 047615 function estimation using LS SVM function estimation using LS SVM_ LS SVM LS SVM 3L data 3L data J Real function Real function 2 2 3 3 4 4 5 0 5 5 0 5 X x a b Figure 3 11 Experiments on a noisy sinc dataset with 15 outliers a Application of the standard training and hyperparameter selection tech
76. ing the input data pre_yscheme Scheme used for preprocessing the output data pre_xmean Mean of the input data pre_xstd Standard deviation of the input data pre_ymean Mean of the responses pre_ystd Standard deviation of the reponses e The specifications of the used encoding only given if appropriate code Status of the coding C original changed or encoded codetype Used function for constructing the encoding for multiclass classification by default none codetype_args Arguments of the codetype function codedist_fct Function used to calculate to which class a coded result belongs codedist_args Arguments of the codedist function codebook2 Codebook of the new coding codebooki Codebook of the original coding Full syntax e gt gt model initlssvm X Y type gam sig2 kernel preprocess Outputs model Object oriented representation of the LS SVM model Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the outputs of the training data type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original e gt gt model changelssvm model field value Outputs model Obtained object oriented representation of the LS SVM model Inputs model Original object oriented representation
77. ining data gt gt X X sel 169 end gt gt Y Y sel 169 end gt gt gt gt model initlssvm X Y f 0 RBF_kernel o gt gt model tunelssvm model simplex crossvalidatelssvm 10 mse gt gt model trainlssvm model gt gt Yhci simlssvm model X gt gt Yhpi simlssvm model Xt gt gt Yhci indci sort Yhci descend gt gt Yhpi indpi sort Yhpi descend gt gt gt gt Simultaneous confidence intervals gt gt ci cilssvm model 0 05 simultaneous ci ci indci gt gt plot Yhci hold all plot ci 1 g plot ci 2 g gt gt gt gt Simultaneous prediction intervals gt gt pi predlssvm model Xt 0 05 simultaneous pi pi indpi gt gt plot Yhpi hold all plot pi 1 g plot pi 2 g A a Oo N wo T T J I T sorted m Xt Test data sorted m X Training data 2o 50 100 150 200 250 300 350 0 20 40 60 80 100 120 140 160 180 Index Index a b e Figure 3 10 a Simultaneous 95 confidence intervals for the Boston Housing data set dots Sorted outputs are plotted against their index b Simultaneous 95 prediction intervals for the Boston Housing data set dots Sorted outputs are plotted against their index 3 3 REGRESSION 33 3 3 6 Robust regression First a dataset contain
78. ion Comm Statist 4 1 17 Wahba G 1990 Spline Models for observational data SIAM 39 Xavier de Souza S Suykens J A K Vandewalle J amp Boll D 2010 Coupled Simulated Annealing IEEE Transactions on Systems Man and Cybernetics Part B 40 2 320 335
79. ion demofun Simple demo illustrating the use of LS SVMlab for regression demo_fixedsize Demo illustrating the use of fixed size LS SVMs for regression democlass Simple demo illustrating the use of LS SVMlab for classification demo_fixedclass Demo illustrating the use of fixed size LS SVMs for classification demomodel Simple demo illustrating the use of the object oriented interface of LS SVMlab demo_yinyang Demo illustrating the possibilities of unsuper vised learning by kernel PCA democonfint Demo illustrating the construction of confi dence intervals for LS SVMs regression A 3 ALPHABETICAL LIST OF FUNCTION CALLS 53 A 3 Alphabetical list of function calls A 3 1 AFEm Purpose Automatic Feature Extraction by Nystr m method Basic syntax gt gt features AFEm X kernel sig2 Xt Description Using the Nystr m approximation method the mapping of data to the feature space can be evalu ated explicitly This gives features that one can use for a parametric regression or classification in the primal space The decomposition of the mapping to the feature space relies on the eigenvalue decomposition of the kernel matrix The Matlab eigs or Nystr m s eign approximation using the nb most important eigenvectors eigenvalues can be used The eigenvalue decomposition is not re calculated if it is passed as an extra argument Full syntax gt gt features U lam gt gt features U lam gt gt featur
80. matrix Purpose Construct the positive semi definite and symmetric kernel matrix Basic Syntax gt gt Omega kernel_matrix X kernel_fct sig2 Description This matrix should be positive definite if the kernel function satisfies the Mercer condition Con struct the kernel values for all test data points in the rows of Xt relative to the points of X gt gt Omega_Xt kernel_matrix X kernel_fct sig2 Xt Full syntax gt gt Omega kernel_matrix X kernel_fct sig2 gt gt Omega kernel_matrix X kernel_fct sig2 Xt Outputs Omega NXN NxNt kernel matrix Inputs X Nxd matrix with the inputs of the training data kernel Kernel type by default RBF_kernel sig2 Kernel parameter s for linear kernel use Xt Nt xd matrix with the inputs of the test data See also RBF_kernel lin_kernel kpca trainlssvm 82 APPENDIX A MATLAB FUNCTIONS A 3 19 kpca Purpose Kernel Principal Component Analysis KPCA Basic syntax gt gt eigval eigvec kpca X kernel_fct sig2 gt gt eigval eigvec scores kpca X kernel_fct sig2 Xt Description Compute the nb largest eigenvalues and the corresponding rescaled eigenvectors corresponding with the principal components in the feature space of the centered kernel matrix To calculate the eigenvalue decomposition of this N x N matrix Matlab s eig is called by default The decomposition can also be approximated by Matlab eigs or by Nyst
81. mmand cilssvm gt gt ci cilssvm model alpha pointwise Typically the value of the significance level alpha is set to 5 The confidence intervals obtained by this command are pointwise For example by looking at two pointwise confidence intervals in Figure 3 8 a Fossil data set 26 we can make the following two statements separately e 0 70743 0 70745 is an approximate 95 pointwise confidence interval for m 105 e 0 70741 0 70744 is an approximate 95 pointwise confidence interval for m 120 However as is well known in multiple comparison theory it is wrong to state that m 105 is contained in 0 70743 0 70745 and simultaneously m 120 is contained in 0 70741 0 70744 with 95 confidence Therefore it is not correct to connect the pointwise confidence intervals to produce a band around the estimated function In order to make these statements we have to modify the interval to obtain simultaneous confidence intervals Three major groups exist to modify the interval Monte Carlo simulations Bonferroni id k corrections and results based on distributions of maxima and upcrossing theory 18 The latter is implemented in the software Figure shows the 95 pointwise and simultaneous confidence intervals on the estimated LS SVM model As expected the simultaneous intervals are much wider than pointwise intervals Simultaneous confidence intervals can be obtained by gt gt ci cilssvm model alpha simultaneous
82. nce are combined by default by the mean The assumption is made that the input data are distributed independent and identically over the input space As additional output the costs in the different folds costs of the data are returned gt gt cost costs crossvalidate model Some commonly used criteria are gt gt cost crossvalidate model 10 misclass mean gt gt cost crossvalidate model 10 mse mean gt gt cost crossvalidate model 10 mae median Full syntax e Using LS SVMlab with the functional interface gt gt cost costs crossvalidate X Y type gam sig2 kernel preprocess gt gt cost costs crossvalidate X Y type gam sig2 kernel preprocess L gt gt cost costs crossvalidate X Y type gam sig2 kernel preprocess L estfct combinefct Outputs cost Cost estimation of the L fold cross validation costs Lx1 vector with costs estimated on the L different folds Inputs X Training input data used for defining the LS SVM and the preprocessing Y Training output data used for defining the LS SVM and the preprocess ing type function estimation f or classifier c gam Regularization parameter sig2 Kernel parameter s for linear kernel use kernel Kernel type by default RBF_kernel preprocess preprocess or original L Number of folds by default 10 estfct Function estimating
83. nel or lin_kernel MLP_kernel poly_kernel Inputs x1 1xd matrix with a data point X2 Nxd matrix with data points sig2 Kernel parameters See also kernel_matrix kpca trainlssvm 88 APPENDIX A MATLAB FUNCTIONS A 3 23 linf mae medae misclass mse Purpose Cost measures of residuals Description A variety of global distance measures can be defined N Jo e mae Ly C e Ziz leil e medae Li Cpedian e median e e linf Ly Cr e sup lei N iy ze e misclass Lg Cr e Di luthl N o aA e mse L Cr e T Full syntax e gt gt C mse e Outputs C Estimated cost of the residuals Calls mse mae medae linf or mse Inputs e Nxd matrix with residuals e gt gt C which trimmedmse e beta norm Outputs Cc Estimated cost of the residuals which Nxd matrix with indexes of the used residuals Inputs e Nxd matrix with residuals beta Trimming factor by default 0 15 norm Function implementing norm by default squared norm e gt gt rate n which misclass Y Yh Outputs rate Rate of misclassification between 0 none misclassified and 1 all misclassified n Number of misclassified data points which Indexes of misclassified points Inputs Y Nxd matrix with true class labels Yh Nxd matrix with estimated class labels See also crossvalidate leaveoneout rcrossvalidate A 3 ALPHABETICAL LIST OF FUNCTION CALLS 89 A 3 24 l1ssvm Purpose Construc
84. ning of the parameters is conducted in two steps First a state of the art global optimization technique Coupled Simulated Annealing CSA 52 determines suitable parameters according to some criterion Second these parameters are then given to a second optimization procedure simplex or gridsearch to perform a fine tuning step CSA have already proven to be more effective than multi start gradient descent optimization 35 Another advantage of CSA is that it uses the acceptance temperature to control the variance of the acceptance probabilities with a control scheme This leads to an improved optimization efficiency because it reduces the sensitivity of the algorithm to the initialization parameters while guiding the optimization process to quasi optimal runs By default CSA uses five multiple starters 2 1 3 Bayesian framework Function calls bay_lssvm bay_optimize bay_lssvmARD bay_errorbar bay_modoutClass kpca eign Demos Subsections 3 2 5 Functions for calculating the posterior probability of the model and hyper parameters at different levels of inference are available bay_1lssvm 41 Errors bars are obtained by tak ing into account model and hyper parameter uncertainties bay_errorbar For classification 44 one can estimate the posterior class probabilities this is also called the moderated output bay_modoutClass The Bayesian framework makes use of the eigenvalue decomposition of the kernel matrix The size of
85. niques b Application of an iteratively reweighted LS SVM training together with a robust crossvalidation score function which enhances the test set performance 34 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES In a second more extreme example we have taken the contamination distribution to be a cubic standard Cauchy distribution and e 0 3 gt gt X 5 07 5 gt gt epsilon 0 3 gt gt sel rand length X 1 gt epsilon gt gt Y sinc X sel normrnd 0 1 length X 1 1 sel trnd 1 length X 1 73 As before we use the robust version of cross validation The weight function in the cost function is chosen to be the myriad weights All weight functions W R gt 0 1 with W r vr satisfying W 0 1 are shown in Table B I with corresponding loss function L r and score function y r oer This type of weighting function is especially designed to handle extreme outliers The results are shown in Figure 3 12 Three of the four weight functions contain parameters which have to be tuned see Table 3 1 The software automatically tunes the parameters of the huber and myriad weight function according to the best performance for these two weight functions The two parameters of the Hampel weight function are set to b 2 5 and b2 3 gt gt model initlssvm X Y f 0 RBF_kernel gt gt L_fold 10 10 fold CV gt gt model tunelssvm model simplex gt rcrossvalidatelssvm
86. o make a window Each window is put in a row of matrix W The matrix W contains as many rows as there are different windows selected in A Schematically this becomes gt gt A al a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 gt gt W windowize A 1 2 3 W al a2 a3 bil b2 b3 cl c2 c3 bi b2 b3 cl c2 c3 di d2 d3 c1 c2 c3 di d2 d3 el e2 e3 d1 d2 d3 el e2 e3 f1 f2 f3 el e2 e3 f1 f2 f3 gi g2 g3 The function windowizeNARX converts the time series and its exogeneous variables into a block Hankel format useful for training a nonlinear function approximation as a nonlinear ARX model Full syntax e gt gt Xw windowize X window The length of window is denoted by w Outputs Xw N w 1 x w matrix of the sequences of windows over X Inputs X Nx1 vector with data points w wx vector with the relative indices of one window e gt gt Xw Yw xdim ydim n windowizeNARX X Y xdelays ydelays gt gt Xw Yw xdim ydim n windowizeNARX X Y xdelays ydelays steps A 3 ALPHABETICAL LIST OF FUNCTION CALLS 111 Outputs Xw Yw xdim ydin n Inputs X Y xdelays ydelays steps See also Matrix of the data used for input including the delays Matrix of the data used for output including the next steps Number of dimensions in new input Number of dimensions in new output Number of new data points Nxm vector with input data points Nxd vector with output data points
87. ol of training data After estimating eigenfunctions in relation to a Nystr m approximation with selection of the support vectors according to an entropy criterion the LS SVM model is estimated in the primal space The second one is to do active support vector selection here based on entropy criteria The first step is implemented as follows gt gt 4 X Y contains the dataset svX is a subset of X gt gt sig2 1 gt gt features AFEm svX RBF_kernel sig2 X gt gt C13 gam_optimal bay_rr features Y 1 3 gt gt W b ridgeregress features Y gam_optimal gt gt Yh features W b Optimal values for the kernel parameters and the capacity of the fixed size LS SVM can be obtained using a simple Monte Carlo experiment For different kernel parameters and capacities number of chosen support vectors the performance on random subsets of support vectors are evaluated The means of the performances are minimized by an exhaustive search Figure 8 15 gt gt caps 10 20 50 100 200 gt gt sig2s 1 2 5 1 2 4 10 gt gt nb 10 gt gt for i 1 length caps for j 1 length sig2s for t i nb sel randperm size X 1 svX X sel 1 caps i features AFEm svX RBF_kernel sig2s j X C13 gam_opt bay_rr features Y 1 3 W b ridgeregress features Y gam_opt Yh features Wtb performances t mse Y Yh end 38 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES minimal_performanc
88. on gt gt Yp lssvm X Y type The lssvm command automatically tunes the tuning parameters via 10 fold cross validation CV or leave one out CV depending on the sample size This function will automatically plot when possible the solution By default the Gaussian RBF kernel is taken Further information can be found in 22 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES 3 2 5 Bayesian inference for classification This Subsection further proceeds on the results of Subsection B 2 2 A Bayesian framework is used to optimize the tuning parameters and to obtain the moderated output The optimal regularization parameter gam and kernel parameter sig2 can be found by optimizing the cost on the second and the third level of inference respectively It is recommended to initiate the model with appropriate starting values gt gt gam sig2 bay_initlssvm X Y type gam sig2 RBF_kernel Optimization on the second level leads to an optimal regularization parameter gt gt model gam_opt bay_optimize X Y type gam sig2 RBF_kernel 2 Optimization on the third level leads to an optimal kernel parameter gt gt cost_L3 sig2_opt bay_optimize X Y type gam_opt sig2 RBF_kernel 3 The posterior class probabilies are found by incorporating the uncertainty of the model parameters gt gt gam 10 gt gt sig2 1 gt gt Ymodout bay_modoutClass X Y type 10 1 RBF_kernel figure One can speci
89. opean Symposium on Artificial Neural Networks ESANN 2001 Bruges Belgium 13 18 Van Gestel T Suykens J A K Baesens B Viaene S Vanthienen J Dedene G De Moor B Vandewalle J 2001 Benchmarking least squares support vector machine classifiers Machine Learning 54 1 5 32 Van Gestel T Suykens J A K Lanckriet G Lambrechts A De Moor B Vandewalle J 2002 Bayesian framework for least squares support vector machine classifiers gaussian processes and kernel fisher discriminant analysis Neural Computation 15 5 1115 1148 Van Gestel T Suykens J A K Lanckriet G Lambrechts A De Moor B Vandewalle J 2002 Multiclass LS SVMs moderated outputs and coding decoding schemes Neural Processing Letters 15 1 45 58 Van Gestel T Suykens J A K De Moor B Vandewalle J 2002 Bayesian inference for LS SVMs on large data sets using the Nystr m method International Joint Conference on Neural Networks WCCI IJCNN 2002 Honolulu USA May 2002 2779 2784 Vapnik V 1995 The Nature of Statistical Learning Theory Springer Verlag New York Vapnik V 1998 Statistical Learning Theory John Wiley New York Williams C K I Seeger M 2001 Using the Nystr m method to speed up kernel machines Advances in neural information processing systems 13 682 688 MIT Press Wahba G Wold S 1975 A completely automatic french curve fitting spline functions by cross validat
90. osterior of the regularization parameter is computed The etype can be svd eig eigs or Nystrom s eign e Third level The cost associated with the posterior of the chosen kernel and kernel param eters is computed The etype can be svd eig eigs or Nystr m s eign Full syntax e Outputs on the first level gt gt costL1 Ed Ew bay bay_lssvm X Y type gam sig2 kernel preprocess 1 gt gt costL1 Ed Ew bay bay_lssvm X Y type gam sig2 kernel preprocess 1 etype gt gt costL1 Ed Ew bay bay_lssvm X Y type gam sig2 kernel preprocess 1 etype nb gt gt costL1 Ed Ew bay bay_lssvm model 1 gt gt costL1 Ed Ew bay bay_lssvm model 1 etype gt gt costL1 Ed Ew bay bay_lssvm model 1 etype nb With costL1 Cost proportional to the posterior Ed Cost of the training error term Ew Cost of the regularization parameter bay Object oriented representation of the results of the Bayesian inference e Outputs on the second level 58 APPENDIX A MATLAB FUNCTIONS gt gt costL2 DcostL2 optimal_cost bay bay_lssvm X Y type gam sig2 kernel preprocess 2 etype nb gt gt costL2 DcostL2 optimal_cost bay bay_lssvm model 2 etype nb With costL2 Cost proportional to the posterior on the second level DcostL2 Derivative of the cost optimal_cost Optimality of the regularization parameter optimal 0 bay Object oriented
91. othing The kernel_type indicates the function that is called to compute the kernel value by default RBF_kernel Other kernels can be used for example gt gt alpha b trainlssvm X Y type gam d p poly_kernel gt gt alpha b trainlssvm X Y type gam lin_kernel The kernel parameter s are passed as a column vector in the case no kernel parameter is needed pass the empty vector The training can either be proceeded by the preprocessing function preprocess by de fault or not original The training calls the preprocessing prelssvm postlssvm and the encoder codelssvm if appropriate In the remainder of the text the content of the cell determining the LS SVM is given by X Y type gam sig2 However the additional arguments in this cell can always be added in the calls If one uses the object oriented interface see also A 3 16 the training is done by gt gt model trainlssvm model gt gt model trainlssvm model X Y The status of the model checks whether a retraining is needed The extra arguments X Y allow to re initialize the model with this new training data as long as its dimensions are the same as the old initiation One implementation is included e The Matlab implementation a straightforward implementation based on the matrix division lssvmMATLAB m This implementation allows to train a multidimensional output problem If each output uses the sam
92. parameter for the regression in the primal space is optimized here using the Bayesian framework b Estimated cost surface of the fixed size LS SVM based on random subsamples of the data of different subset capacities and kernel parameters 3 3 REGRESSION 39 The same idea can be used for learning a classifier from a huge data set gt gt load the input and output of the trasining data in X and Y gt gt cap 25 The first step is the same the selection of the support vectors by optimizing the entropy cri terion Here the pseudo code is showed For the working code one can study the code of demo_fixedclass m initialise a subset of cap points Xs gt gt for i 1 1000 Xs_old Xs substitute a point of Xs by a new one crit kentropy Xs kernel kernel_par if crit is not larger then in the previous loop substitute Xs by the old Xs_old end By taking the values 1 and 1 as targets in a linear regression the Fisher discriminant is obtained gt gt features AFEm Xs kernel sigma2 X gt gt w b ridgeregress features Y gamma New data points can be simulated as follows gt gt features_t AFEm Xs kernel sigma2 Xt gt gt Yht sign features_t wtb An example of a resulting classifier and the selected support vectors is displayed in Figure 3 16 see demo_fixedclass Approximation by fixed size LS SVM based on maximal entropy 2 3195 Negative points Positive points
93. presentation of the LS SVM model grain The grain of the grid evaluated to compose the surface by default 50 seldims The principal inputs one wants to span a grid by default 1 2 See also trainlssvm simlssvm A 3 ALPHABETICAL LIST OF FUNCTION CALLS 91 A 3 26 predict Purpose Iterative prediction of a trained LS SVM NARX model in recurrent mode Description gt gt Yp predict Xw Yw type gam sig2 Xt nb gt gt Yp predict model Xt nb Description The model needs to be trained using Xw Yw which is the result of windowize or windowizeNARX The number of time lags for the model is determined by the dimension of the input or if not appropriate by the number of given starting values By default the model is evaluated on the past points using simlssvm However if one wants to use this procedure for other models this default can be overwritten by your favorite training function This function denoted by simfct has to follow the following syntax gt gt simfct model inputs arguments thus gt gt Yp predict model Xt nb simfct gt gt Yp predict model Xt nb simfct arguments Full syntax e Using the functional interface for the LS SVMs gt gt Yp predict Xw Yw type gam sig2 kernel preprocess Xt gt gt Yp predict Xw Yw type gam sig2 kernel preprocess Xt nb Outputs Yp nbx1 matrix with the predictions Inputs Xw Nxd matrix with the inputs of the training
94. r m s method eign using nb components In some cases one wants to disable original the rescaling of the principal components in feature space to unit length The scores of a test set Xt on the principal components is computed by the call gt gt eigval eigvec scores kpca X kernel_fct sig2 Xt Full syntax gt gt eigval eigvec empty omega kpca X kernel_fct sig2 gt gt eigval eigvec empty omega kpca X kernel_fct sig2 etype gt gt eigval eigvec empty omega kpca X kernel_fct sig2 etype nb gt gt eigval eigvec empty omega kpca X kernel_fct sig2 etype nb rescaling gt gt eigval eigvec scores omega kpca X kernel_fct sig2 Xt gt gt eigval eigvec scores omega kpca X kernel_fct sig2 Xt etype gt gt eigval eigvec scores omega kpca X kernel_fct sig2 Xt etype nb gt gt eigval eigvec scores omega kpca X kernel_fct sig2 Xt etype nb rescaling gt gt eigval eigvec scores omega recErrors kpca X kernel_fct sig2 Xt etype gt gt eigval eigvec scores omega recErrors kpca X kernel_fct sig2 Xt etype nb gt gt eigval eigvec scores omega recErrors etype nb rescaling kpca X kernel_fct sig2 Xt gt gt eigval eigvec scores omega recErrors optOut sig2 Xt etype kpca X kernel_fct gt gt eigval eigvec scores omega recErrors optOut etype nb kpca X kernel_
95. s the method suitable for solving large scale nonlinear function estimation and classification problems The method of fixed size LS SVM is suitable for handling very large data sets An alternative criterion for subset selection was presented by 3 4 which is closely related to and 30 It measures the quality of approximation of the feature space and the space induced 16 CHAPTER 2 A BIRDS EYE VIEW ON LS SVMLAB by the subset see Automatic Feature Extraction or AFEm In the subset was taken as a random subsample from the data subsample Chapter 3 LS SVMlab toolbox examples 3 1 Roadmap to LS SVM In this Section we briefly sketch how to obtain an LS SVM model valid for classification and regression see Figure B I 1 Choose between the functional or objected oriented interface initlssvm see A 3 16 2 Search for suitable tuning parameters tunelssvm see A 3 36 3 Train the model given the previously determined tuning parameters trainlssvm see A 3 35 4a Simulate the model on e g test data simlssvm see A 3 34 4b Visualize the results when possible plot1lssvm see A 3 25 functional interface data X Y tunelssvm trainlssvm simlssvm object oriented interface LJ plotlssvm Figure 3 1 List of commands for obtaining an LS SVM model 3 2 Classification At first the possibilities of the toolbox for classification tasks are illustrated
96. s used in the eigenvalue decomposi tion approximation e Inputs using the object oriented interface gt gt model gt gt model gt gt model bay_optimize model level model level bay_optimize model level etype bay_optimize model level etype nb Object oriented representation of the LS SVM model 1 2 3 etype eig svd eigs eign nb See also Number of eigenvalues eigenvectors used in the eigenvalue decomposi tion approximation bay_lssvm bay_lssvmARD bay_modoutClass bay_errorbar A 3 ALPHABETICAL LIST OF FUNCTION CALLS 65 A 3 8 bay_rr Purpose Bayesian inference of the cost on the three levels of linear ridge regression Basic syntax gt gt cost bay_rr X Y gam level Description This function implements the cost functions related to the Bayesian framework of linear ridge Regression 44 Optimizing these criteria results in optimal model parameters W b and tuning parameters The criterion can also be used for model comparison The obtained model parameters w and b are optimal on the first level for J 0 5 w w gam 0 5 sum Y X w b 2 Full syntax e Outputs on the first level Cost proportional to the posterior of the model parameters gt gt costL1 Ed Ew bay_rr X Y gam 1 With costL1 Cost proportional to the posterior Ed Cost of the training error term Ew Cost of the regularization parameter e Outputs on the second level
97. search by coupled local minimizers International Journal of Bifurcation and Chaos 11 8 21332144 Sun J and Loader C R 1994 Simultaneous confidence bands for linear regression and smoothing Annals of Statistics 22 3 1328 1345 Suykens J A K Van Gestel T Vandewalle J De Moor B 2002 A support vector machine formulation to PCA analysis and its Kernel version IEEE Transactions on Neural Networks 14 2 447 450 Suykens J A K Van Gestel T De Brabanter J De Moor B Vandewalle J 2002 Least Squares Support Vector Machines World Scientific Singapore Suykens J A K 2008 Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point IEEE Transactions on Neural Networks 19 9 1501 1517 Van Belle V Pelckmans K Suykens J A K Van Huffel S 2010 Additive survival least squares support vector machines Statistics in Medicine 29 2 296 308 Van Gestel T Suykens J A K Baestaens D Lambrechts A Lanckriet G Vandaele B De Moor B Vandewalle J 2001 Financial time series prediction using least squares support vector machines within the evidence framework IEEE Transactions on Neural Networks special issue on Neural Networks in Financial Engineering 12 4 809 821 Van Gestel T Suykens J A K De Moor B Vandewalle J 2001 Automatic relevance determination for least squares support vector machine classifiers Proc of the Eur
98. sed PCA kpca as described by 80 for which a primal dual interpretation with least squares support vector machine formulation has been given in 87 which has also be further extended to kernel canonical correlation analysis 38 and kernel PLS 2 4 Solving large scale problems with fixed size LS SVM Function calls demofixedsize AFEm kentropy Demos Subsection 3 3 9 demo_fixedsize demo_fixedclass Classical kernel based algorithms like e g LS SVM typically have memory and computa tional requirements of O N Work on large scale methods proposes solutions to circumvent this bottleneck 38 80 For large datasets it would be advantageous to solve the least squares problem in the primal weight space because then the size of the vector of unknowns is proportional to the feature vector dimension and not to the number of datapoints However the feature space mapping induced by the kernel is needed in order to obtain non linearity For this purpose a method of fixed size LS SVM is proposed 38 Firstly the Nystr m method can be used to estimate the feature space mapping The link between Nystr m approximation kernel PCA and density estimation has been discussed in 12 In fixed size LS SVM these links are employed together with the explicit primal dual LS SVM interpretations The support vectors are selected according to a quadratic Renyi entropy criterion kentropy In a last step a regression is done in the primal space which make
99. sion and classification are available Training and simulation can be done for each output separately by passing different kernel functions kernel and or regularization parameters as a column vector It is straightforward to implement other kernel functions in the toolbox The performance of a model depends on the scaling of the input and output data An appro priate algorithm detects and appropriately rescales continuous categorical and binary variables prelssvm postlssvm An important tool accompanying the LS SVM for function estimation is the construction of interval estimates such as confidence intervals In the area of kernel based regression a popular tool to construct interval estimates is the bootstrap see e g and reference therein The functions cilssvm and predlssvm result in confidence and prediction intervals respectively for 1See http www kernel machines org software html for other software in kernel based learning techniques 13 14 CHAPTER 2 A BIRDS EYE VIEW ON LS SVMLAB LS SVM 9 This method is not based on bootstrap and thus obtains in a fast way interval estimates 2 1 1 Classification extensions Function calls codelssvm code deltablssvm roc latentlssvm Demos Subsection 3 2 democlass A number of additional function files are available for the classification task The latent vari able of simulating a model for classification latentlssvm is the continuous result obtained by simulation which
100. ss validation case other possibilities for the weights are whampel wlogistic and wmyriad In case of function approximation for a linear kernel gt gt gam tunelssvm X Y f 0J 0 lin_kernel simplex g p leaveoneoutlssvm mse gt gt gam tunelssvm X Y f RBF_kernel linesearch gt leaveoneoutlssvm mse In the case of the RBF kernel gt gt gam sig2 tunelssvm X Y f RBF_kernel simplex gt leaveoneoutlssvm mse gt gt gam sig2 tunelssvm X Y f RBF_kernel gridsearch leaveoneoutlssvm mse 106 APPENDIX A MATLAB FUNCTIONS In case of the polynomial degree is automatically tuned and robust 10 fold cross validation combined with logistic weights gt gt gam sig2 tunelssvm X Y f poly_kernel simplex gt rcrossvalidatelssvm 10 mae wlogistic In the case of classification notice the use of the function misclass gt gt gam tunelssvm X Y c lin_kernel simplex leaveoneoutlssvm misclass gt gt gam tunelssvm X Y c lin_kernel linesearch gt leaveoneoutlssvm misclass In the case of the RBF kernel where the 10 fold cross validation cost function is the number of misclassifications misclass tunelss
101. t Yopt iterations fig Inputs CostFun StartingValues FunArgs option value The different options Full syntax linesearch fun startvalues funargs optioni valuel Optimal parameter set Criterion evaluated at Xopt Used number of iterations Handle to the figure of the optimization Function implementing the cost criterion 2 d matrix with limit values of the widest grid Cell with optional extra function arguments of fun The name of the option one wants to change The new value of the option one wants to change Nofigure figure or nofigure MaxFunEvals Maximum number of function evaluations de fault 20 GridReduction grid reduction parameter e g 1 5 small re duction 10 heavy reduction default 2 TolFun Minimal toleration of improvement on func tion value default 0 01 TolX Minimal toleration of improvement on X value default 0 01 Grain Number of evaluations per iteration default 10 e SIMPLEX multidimensional unconstrained non linear optimization Simplex finds a local minimum of a function via a function handle fun starting from an initial point X The local minimum is located via the Nelder Mead simplex algorithm 23 which does not require any gradient information opt contains the user specified options via a structure The different options are set via a structure with members denoted by opt gt gt Xopt
102. t sig_e bay_errorbar X Y function gam sig2 kernel preprocess gt gt sig_e bay_errorbar X Y function gam sig2 kernel preprocess gt gt sig_e bay_errorbar X Y function gam sig2 kernel preprocess gt gt sig_e bay_errorbar X Y function gam sig2 kernel preprocess Outputs sig_e Nt x1 vector with the o error bars of the test data Inputs X Nxd matrix with the inputs of the training data Y Nx1 vector with the inputs of the training data type function estimation f gam Regularization parameter sig2 Kernel parameter kernel Kernel type by default RBF_kernel preprocess preprocess or original Xt Xt etype Xt etype nb figure figure etype nb Xt Ntxd matrix with the inputs of the test data etype gt svd eig eigs or eign nb Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation e Using the object oriented interface gt gt gt gt gt gt gt gt gt gt gt gt sig_e sig_e sig_e sig_e sig_e sig_e bay model bay_errorbar model Xt bay model bay_errorbar model Xt etype bay model bay_errorbar model Xt etype nb bay model bay_errorbar model figure bay model bay_errorbar model figure etype bay model bay_errorbar model figure etype nb A 3 ALPHABETICAL LIST OF FUNCTION CALLS 55
103. t an LS SVM model with one command line and visualize results if possible Basic syntax gt gt yp lssvm X Y type gt gt yp lssvm X Y type kernel Description type can be classifier or function estimation these strings can be abbreviated into c or f respectively X and Y are matrices holding the training input and training output The i th data point is represented by the i th row X i and Y i The tuning parameters are automatically tuned via leave one out cross validation or 10 fold cross validation depending on the size of the data set Leave one out cross validation is used when the size is less or equal than 300 points The loss functions for cross validation are mse for regression and misclass for classification If possible the results will be visualized using plotlssvm By default the Gaussian RBF kernel is used Other kernels can be used for example gt gt Yp gt gt Yp 1lssvm X Y type lin_kernel 1lssvm X Y type poly_kernel When using the polynomial kernel there is no need to specify the degree of the polynomial the software will automatically tune it to obtain best performance on the cross validation or leave one out score functions gt gt Yp lssvm X Y type RBF_kernel gt gt Yp lssvm X Y type lin_kernel gt gt Yp lssvm X Y type poly_kernel Full syntax gt gt Yp alpha b gam sig2 model lssvm X Y type gt gt Yp alpha
104. ta points or more LS SVMlab s interface for Matlab consists of a basic version for beginners as well as a more advanced version with programs for multi class encoding techniques and a Bayesian framework Future versions will gradually incorporate new results and additional functionalities A number of functions are restricted to LS SVMs these include the extension lssvm in the function name the others are generally usable A number of demos illustrate how to use the different features of the toolbox The Matlab function interfaces are organized in two principal ways the functions can be called either in a functional way or using an object oriented structure referred to as the model as e g in Netlab 22 depending on the user s choicd 2 1 Classification and regression Function calls trainlssvm simlssvm plotlssvm prelssvm postlssvm cilssvm predlssvm Demos Subsections 3 2 8 3 demofun democlass democonfint The Matlab toolbox is built around a fast LS SVM training and simulation algorithm The corresponding function calls can be used for classification as well as for function estimation The function plotissvm displays the simulation results of the model in the region of the training points The linear system is solved via the flexible and straightforward code implemented in Matlab 1ssvmMATLAB m which is based on the Matlab matrix division backslash command Functions for single and multiple output regres
105. tency anymore in this way The different options are given in following table e General options representing the kind of model type classifier function estimation status Status of this model trained or changed alpha Support values of the trained LS SVM model b Bias term of the trained LS SVM model duration Number of seconds the training lasts latent Returning latent variables no yes x_delays Number of delays of eXogeneous variables by default O y_delays Number of delays of responses by default 0 steps Number of steps to predict by default 1 gam Regularisation parameter kernel_type Kernel function kernel_pars Extra parameters of the kernel function weights Weighting function for robust regression e Fields used to specify the used training data x_dim Dimension of input space y_dim Dimension of responses nb_data Number of training data xtrain preprocessed inputs of training data ytrain preprocessed coded outputs of training data selector Indexes of training data effectively used during training costCV Cost of the cross validation score function when model is tuned A 3 ALPHABETICAL LIST OF FUNCTION CALLS 79 e Fields with the information for pre and post processing only given if appropriate preprocess preprocess or original schemed Status of the preprocessing coded original or schemed pre_xscheme Scheme used for preprocess
106. tion of the LS SVM model b_new mx1 vector with new bias term s for the model See also roc trainlssvm simlssvm changelssvm 74 APPENDIX A MATLAB FUNCTIONS A 3 13 denoise_kpca Purpose Reconstruct the data mapped on the most important principal components Basic syntax gt gt Xd denoise_kpca X kernel kernel_par Description Denoising can be done by moving the point in input space so that its corresponding map to the feature space is optimized This means that the data point in feature space is as close as possible with its corresponding reconstructed points by using the principal components If the principal components are to be calculated on the same data X as one wants to denoise use the command gt gt Xd denoise_kpca X kernel kernel_par gt gt Xd lam U denoise_kpca X kernel kernel_par etype nb When one wants to denoise data Xt other than the data used to obtain the principal components gt gt Xd gt gt Xd lam U denoise_kpca X kernel kernel_par Xt denoise_kpca X kernel kernel_par Xt etype nb Full syntax e gt gt Xd lam U denoise_kpca X kernel kernel_par Xt gt gt Xd lam U denoise_kpca X kernel kernel_par Xt etype gt gt Xd lam U denoise_kpca X kernel kernel_par Xt etype nb Outputs Xd Nxd Nt xd matrix with denoised data X Xt lam nbx1 vector with eigenvalues of principal components u Nxnb Nt xd
107. tlssvm Calculate the latent variables of the LS SVM classifier plotlssvm Plot the LS SVM results in the environment of the training data simlssvm Evaluate the LS SVM at the given points trainlssvm Find the support values and the bias term of a Least Squares Support Vector Machine lssvm One line LS SVM cilssvm pointwise or simultaneous confidence intervals predlssvm pointwise or simultaneous prediction intervals A 2 INDEX OF FUNCTION CALLS 43 A 2 2 Object oriented interface This toolbox supports a classical functional interface as well as an object oriented interface The latter has a few dedicated functions This interface is recommended for the more experienced user Function Call Short Explanation Reference changelssvm Change properties of an LS SVM object A 3 16 demomodel Demo introducing the use of the compact calls based on the model structure initlssvm Initiate the LS SVM object before training A 3 16 44 APPENDIX A MATLAB FUNCTIONS A 2 3 Training and simulating functions Function Call lssvmMATLAB m prelssvm postlssvm Short Explanation Reference MATLAB implementation of training Internally called preprocessor A 3 29 Internally called postprocessor A 3 29 A 2 INDEX OF FUNCTION CALLS 45 A 2 4 Kernel functions Function Call Short Explanation Reference lin_kernel Linear kernel for MATLAB implementation poly_kernel Polynomial kernel for MATLAB implementa A 3 22 tion RBF_kernel Radial Basis F
108. treat the different outputs separately One can also let the toolbox do this by passing the right arguments This case illustrates how to handle multiple outputs gt gt load data in X Xt and Y gt gt where size Y is N x 3 gt gt gt gt gam 1 gt gt sig2 1 gt gt alpha b trainlssvm X Y classification gam sig2 gt gt Yhs simlssvm X Y classification gam sig2 alpha b Xt Using different kernel parameters per output dimension gt gt gam 1 gt gt sigs 1 2 1 5 gt gt alpha b trainlssvm X Y classification gam sigs gt gt Yhs simlssvm X Y classification gam sigs alpha b Xt Tuning can be done per output dimension gt gt tune the different parameters gt gt gam sigs tunelssvm X Y classification RBF_kernel simplex gt crossvalidatelssvm 10 mse 36 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES 3 3 8 A time series example Santa Fe laser data prediction Using the static regression technique a nonlinear feedforward prediction model can be built The NARX model takes the past measurements as input to the model gt gt load time series in X and Xt gt gt lag 50 gt gt Xu windowize X 1 lag 1 gt gt Xtra Xu 1 end lag 1 lag training set gt gt Ytra Xu 1 end lag end training set gt gt Xs X end lagti1 end 1 starting point for iterative prediction Cross val
109. unction kernel for MATLAB im plementation MLP_kernel Multilayer Perceptron kernel for MATLAB im plementation 46 APPENDIX A MATLAB FUNCTIONS A 2 5 Tuning sparseness and robustness Function Call Short Explanation Reference crossvalidate Estimate the model performance with L fold crossvalidation gcrossvalidate Estimate the model performance with general ized crossvalidation rcrossvalidate Estimate the model performance with robust A 3 3 L fold crossvalidation gridsearch A two dimensional minimization procedure based on exhaustive search in a limited range leaveoneout Estimate the model performance with leave one out crossvalidation mae medae L cost measures of the residuals Py e gt P N i iw ol El la la e linf misclass L and Lo cost measures of the residuals A 3 23 mse L cost measures of the residuals tunelssvm Tune the tuning parameters of the model with respect to the given performance measure robustlssvm Robust training in the case of non Gaussian noise or outliers A 2 INDEX OF FUNCTION CALLS A 2 6 Classification extensions Function Call Short Explanation Reference code Encode and decode a multi class classification task to multiple binary classifiers code_ECOC Error correcting output coding code_MOC Minimum Output Coding A 3 10 code_OneVsAll One versus All encoding code_OneVsOne One versus One encoding A 3 10 codedist_hamming Hamming distance measure between two en A 3 10 coded cl
110. unsupervised learning All functions are tested with Matlab R2008a R2008b R2009a R2009b and R2010a Ref erences to commands in the toolbox are written in typewriter font A main reference and overview on least squares support vector machines is J A K Suykens T Van Gestel J De Brabanter B De Moor J Vandewalle Least Squares Support Vector Machines World Scientific Singapore 2002 ISBN 981 238 151 1 The LS SVMlab homepage is http www esat kuleuven be sista lssvmlab The LS SVMlab toolbox is made available under the GNU general license policy Copyright C 2010 KULeuven ESAT SCD This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version 11 12 CHAPTER 1 INTRODUCTION This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FIT NESS FOR A PARTICULAR PURPOSE See the website of LS SVMlab or the GNU General Public License for a copy of the GNU General Public License specifications Chapter 2 A birds eye view on LS SVMlab The toolbox is mainly intended for use with the commercial Matlab package The Matlab toolbox is compiled and tested for different computer architectures including Linux and Windows Most functions can handle datasets up to 20 000 da
111. vm X Y c RBF_kernel simplex gt crossvalidatelssvm 10 misclass tunelssvm X Y c RBF_kernel gridsearch gt crossvalidatelssvm 10 misclass gt gt gam sig2 gt gt gam sig2 The most simple algorithm to determine the minimum of a cost function with possibly multiple optima is to evaluate a grid over the parameter space and to pick the minimum This procedure iteratively zooms to the candidate optimum The StartingValues determine the limits of the grid over parameter space gt gt Xopt gridsearch fun StartingValues This optimization function can be customized by passing extra options and the corresponding value These options cannot be changed in the tunelssvm command The default values of gridsearch linesearch or simplex are used when invoking tunelssvn gt gt Xopt Yopt Evaluations fig gridsearch fun startvalues funargs optioni valuel the possible options and their default values are nofigure figure gt maxFunEvals 190 gt TolFun 0001 gt TolX 0001 gt grain 10 gt zoomfactor 5 An example is given gt gt fun inline 1 exp norm X 1 X 2 X gt gt gridsearch fun 4 3 2 3 the corresponding grid which is evaluated is shown in Figure A1 gt gt gridsearch fun 3 3 3 3 nofigure nofigure MaxFunEvals 1000
112. vm X Y type gam sig2 RBF_kernel alpha b The parameters and the variables relevant for the LS SVM are passed as one cell This cell allows for consistent default handling of LS SVM parameters and syntactical grouping of related arguments This definition should be used consistently throughout the use of that LS SVM model The object oriented interface to LS SVMlab leads to shorter function calls see demomodel1 By default the data are preprocessed by application of the function prelssvm to the raw data and the function postlssvm on the predictions of the model This option can be explicitly switched off in the call gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel original or can be switched on by default gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel preprocess Remember to consistently use the same option in all successive calls To evaluate new points for this model the function simlssvm is used At first test data is generated gt gt Xt rand 10 1 sign randn 10 1 Then the obtained model is simulated on the test data 3 3 REGRESSION 27 gt gt Yt simlssvm X Y type gam sig2 RBF_kernel preprocess alpha b Xt ans 0847 0378 9862 4688 3773 9832 2658 2515 5571 3130 OoOorRPrOOrROORFRO SO The LS SVM result can be displayed if the dimension of the input data is one or two gt gt plotlssvm X Y type
113. vmARD model method etype nb Outputs dimensions ordered costs sig2s model Inputs model method etype nb See also rx1 vector of the relevant inputs dx1 vector with inputs in decreasing order of relevance Costs associated with third level of inference in every selection step Optimal kernel parameters in each selection step Object oriented representation of the LS SVM model trained only on the relevant inputs Object oriented representation of the LS SVM model gt discrete or continuous svd eig eigs eign Number of eigenvalues eigenvectors used in the eigenvalue de composition approximation bay_lssvm bay_optimize bay_modoutClass bay_errorbar A 3 ALPHABETICAL LIST OF FUNCTION CALLS 61 A 3 6 bay_modoutClass Purpose Estimate the posterior class probabilities of a binary classifier using Bayesian inference Basic syntax gt gt Ppos Pneg gt gt Ppos Pneg bay_modoutClass X Y classifier gam sig2 Xt bay_modoutClass model Xt Description Calculate the probability that a point will belong to the positive or negative classes taking into account the uncertainty of the parameters Optionally one can express prior knowledge as a probability between 0 and 1 where prior equal to 2 3 means that the prior positive class probability is 2 3 more likely to occur than the negative class For binary classification tasks with a two dim
114. x method First CSA finds good starting values and these are passed to the simplex method in order to fine tune the result gt gt load dataset gt gt type classification gt gt L_fold 10 L fold crossvalidation gt gt gam sig2 tunelssvm X Y type RBF_kernel simplex gt crossvalidatelssvm L_fold misclass gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel gt gt plotlssvm X Y type gam sig2 RBF_kernel alpha b It is still possible to use a gridsearch in the second run i e as a replacement for the simplex method gt gt gam sig2 tunelssvm X Y type RBF_kernel gridsearch gt crossvalidatelssvm L_fold misclass The Receiver Operating Characteristic ROC curve gives information about the quality of the classifier gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel 20 CHAPTER 3 LS SVMLAB TOOLBOX EXAMPLES gt gt latent variables are needed to make the ROC curve gt gt Y_latent latentlssvm X Y type gam sig2 RBF_kernel alpha b X gt gt area se thresholds oneMinusSpec Sens roc Y_latent Y gt gt thresholds oneMinusSpec Sens ans 2 1915 1 0000 1 0000 1 1915 0 9920 1 0000 1 1268 0 9840 1 0000 1 0823 0 9760 1 0000 0 2699 0 1840 0 9360 0 2554 0 1760 0 9360 0 2277 0 1760 0 9280 0 1811 0 1680 0 9280 1 1184 0 0 0080 1 1220 0
115. y application of the function prelssvm to the raw data and the function postlssvm on the predictions of the model This option can explicitly be switched off in the call gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel original or be switched on by default gt gt alpha b trainlssvm X Y type gam sig2 RBF_kernel preprocess Remember to consistently use the same option in all successive calls To evaluate new points for this model the function simlssvm is used gt gt Xt 2 rand 10 2 1 gt gt Ytest simlssvm X Y type gam sig2 RBF_kernel alpha b Xt 3 2 CLASSIFICATION 19 LS SVMING 20 4 with 2 different classes B Classifier class 1 o class 2 Figure 3 2 Figure generated by plotlssvm in the simple classification task The LS SVM result can be displayed if the dimension of the input data is two gt gt plotlssvm X Y type gam sig2 RBF_kernel alpha b All plotting is done with this simple command It looks for the best way of displaying the result Figure B 2 3 2 2 Example The well known Ripley dataset problem consists of two classes where the data for each class have been generated by a mixture of two normal distributions Figure 3 3h First let us build an LS SVM on the dataset and determine suitable tuning parameters These tuning parameters are found by using a combination of Coupled Simulated Annealing CSA and a standard simple

LS-SVMlab Toolbox User's Guide - Esat

Contents

Download Pdf Manuals

Related Search

Related Contents