Home

SCOUT User`s Guide

image

Contents

1. Choices of Contour Ellipses By pressing the E e key several contour ellipses can be drawn on the various scatter plots available in Scout including scatter plots of raw data scatter plots of PCs and those of discriminant scores These contours can also be erased by pressing the E e key The simultaneous contour is obtained using the probability statement 7 and the individual contour is obtained using the statement 9 given below in Section 6 0 The five contour options are Individual This option simply draws the desired classical or one of the three robust contour ellipse given by the statement 9 on a scatter plot by pressing the E e key Simultaneous This option plots the desired classical or one of the three robust simultaneous contour ellipse given by the statement 7 by pressing the E e key Indiv amp Simult This option plots the desired classical or one of the three robust individual as well as simultaneous contour ellipses given by the statements 7 and 9 on a scatter plot by pressing the E e key Indiv Class This option plots the chosen robust HUBER PROP or MVT and the Scout User s Guide 14 11 Chapter 14 Statistical Procedures corresponding classical contour ellipses given by the statement 9 by pressing the E e key Simult Class This option plots the chosen robust HUBER PROP or MVT and the classical simultaneous contour ellipses given by the statement 7 by pressing the E
2. Note Typically small values of such as 0 001 or 0 005 correspond to classical estimates It is recommended to try a few different values of on the same data set Larger values of 0 15 0 2 etc may be needed to unmask multiple outliers especially in small data sets of large Scout Toturial 11 24 Chapter 11 Tutorial Ill dimensionality Index Plot for Brownles s Stack Loss 127 27 T wh p u c m v in E a n v N n a i T c T o as Haximum rLargezr HOF 11 835 a p x asz Warning tIndividual HD 8 17 t t 5 1i Observation Numbers Figure 11 28 Index plot for STACKLSS DAT using Prop influence 11 6 Generalized Distance Select IRIS DAT from the Data subdirectory of the Scout directory This is a fairly well behaved four dimensional data set of size 50 Return to Robust Method Robust Analysis and within the Select Graph Type menu select Q Q Plot Generalized Dist Set Statistics Options as shown in Figure 11 26 with the exception of a right tail cutoff of 0 05 using Huber Influence to detect outliers Accept the new settings and then generate the graph Figure 11 29 Now exchange Prop Influence for Huber Influence and regenerate the graph Figure 11 30 note the diferences Scout Toturial 11 25 Chapter 11 Tutorial Ill Robust Analysis SS Haximum tLargest HD 15 83 a a SSH Warning Individual HD g 95 ad a aa wh a u c m
3. and minus keys select all variables but Count At this point your display should match Figure 12 1 Film Data Classical Nathod Rosust Nathod PCA Gcagnica Systan 4AAAAAAACAAAACAAAAGCAAAAGAAAAG AAA AE AA A AE AA AA AAA A AA A 5 nnd ELLA SA RxERXEAEAAAAEEAEEERARAEUERAAEHAXAEREAAKEARZAERAAAERAAX Zm mck Vaciaanima exei s 4A44AAAACAAAREAAA4EAA AACAAA amp EAA A4A6A amp A46AA 446A 44464444 Oragiay MHatcicma leaaaana 4AAAAAAACAAAACAAAAGAAAAE AAAAE AAAAEAAAAEAAAACAAAACAAAA Figmaovalumsa 4AAAAAA 4A4ALAAA AAAAGAAA4ACAA4 AGAA 4 ACA A4 ACA A4 AG AA amp 4 46 A4 A446 4444 vian Congonanta leaaaaana 4A4A4AA4ALAAAAE AAA AEAA A ACA AA A4EAA A AE A4 A46 A4 A46 A 4 A46 4444 Teanatocnm Data lenaaana LLLA AAAAAAKAEAAAAEAAAAEAAAAEAAAAEAAAAEAARAAEAAAAE AAA AEAAAAE AAA AEAAA AE AAA AE AAA AE AAA AAA 4AAAAAAACAAAACAAAAGAAAAGAAKAAG AA AA AA AA AA AA AAA AC AAA A4AA4A4A44 4AAAAAAACAAAACAAAAGCAAAAGC AA AA AA AA AA AA AA AA AA AA AA AAGCAAA AGAR AA AA AA AA A A amp AA AAA taaa qm Pocmas cl gt to includa cC gt te mxcludm and lt ENTER gt to mxib gcc ccc cancel aaa ana Vacianla Uam waciaa2im uam waciaaim uam RRRA tasaa e E o ferata t y t 4AAAA count 323 langth a9 width LARRA eraan ft l ngth 9t width hessas 4444 44444 44444 haha RRR AAAA K cancel aaa AAA ELLET ana d daas Pa 4 Vaciaalalaj Salactad 50 Valid Da3amcwvabinnas m re 4AAAAAAACAAAACAAAAGCAAAAGAAAAE AA AA AA AA
4. e key Choices for the X Y Coordinate Scale Factor The scale factor on both of the axes can be controlled by this option The default value is 10 This option is really useful when drawing contour plots especially when parts of the contours are missing Choosing a bigger number will shrink the graph so that the entire contours can be seen on the same graph 14 4 Robust Procedures in Scout Outliers in Univariate Data Sets Let xX X x represent a univariate data set of size n obtained from a normal population with mean u and sd F The MLEs of mean and sd are x x n and s x x nx n 1 The Grubbs test statistic which is equivalent to the Max Mds test for univariate data sets uses the zero breakdown point estimates and therefore suffers from masking effects Dixon 1953 suggested the use of multiple hypotheses testing to identify upper and lower outliers Several classical procedures e g Rosner s 1975 Dixon type test statistics for finding univariate multiple outliers exist in the literature as given in Barnett and Lewis 1994 In practice however the number of outliers k is unknown and it becomes quite tedious to test for multiple hypotheses H k 1 outliers are present Also use of a separate set of critical values is required Scout User s Guide 14 12 Chapter 14 Statistical Procedures for each test Simple robust statistics such as the sample median M and yap gt are sometimes used to esti
5. 008 0 552 esi 0 406 se 22 idth so 426 o sr 0 04 0 471 T58 gt Imagta so 352 n ir 4 n in5 1 00r o1 gt widkn so 245 o 105 1 216 q ris p o rse CENTERS Teana ocn CHS SH i stoge an P gt Pc ik CE SCS SEKI E Tse 1 Mined 5 NaxeS 6 aaanaaaeaananacanaanacanananaeanhAnAkC A4 44EAAAA46 AA4446AA4A 1 Figure 3 2 Transformation functions displayed in the upper nght menu statistics for all variables in the lower window and the histogram for sp length in the upper left window 3 4 1 Normality Tests Upon entering the transform module you are given a choice between two normality tests that can be used These are the Kolmogorov Smirnov test and the Anderson Darling test The test selected will be used throughout the transform module 3 4 0 Statistics Window A window containing statistical information about each variable will appear in the lower portion of the screen The information displayed includes the number of observations mean standard deviation skewness test statistic and critical value for the selected normality test If an asterisk character appears between the test statistic and critical value then that variable did not pass the normality test You may scroll through the information in this Scout User s Guide 3 7 Chapter 3 Managing Data in Scout window by using any of the following keys lt UP ARROW DOWN ARROWS PAGE UP PAGE DOWN lt HOME gt and END Thi
6. DF SrT 449 r aaanaaneaanneaaaa saa naaacaaaaana T Valus 2 0254 laanaaaenanaceAAa aaaaaaaceaaaaaa 4A4AAAAAAECAAAAAAA Lonac Limit 20 5544 l eanaaaacaaaaenaaa daa d ARAAEAXAAAAA d Uggme Limit 55 9979 e cdenadcandaccnaa eatccnccecnnacced 0 75 Teo Sicdmd Limits laaaaaaenaaacaAnna saaaaaaaceaaaaaal PPEEEPETTEPETES EES eaarRRRRERRRRERA ETD eP gt togeint 9 gt bo geagn CESC gt to xxit m kAARRERRRRERRRRR 4AAAAAAACAAAACAAAACAAAAE AA AA AK ARE AR AAE AA AA AK ARA ARAAEAA AA AA AR AK ARE AAA AE AA AAA A 4AAAAAAACAAAACAAAACAAAAG AA AA AR AAE AA AAE AA AA AA ARE AR AA AA A AG AA AA AR AAEAA A AE AA AA A 4 4AAAAAAACAAAACAAAAEAAAAE AA AA AR ARE AA AA AA A AG AR AR AK AA AA A AS AA AR AR AA ECAA AA AA AA A A4 4AAAAAAACAAAAEARAAAECAA AA AA AA AR AA AA AAE AA A AG AR AR AR ARE AA AA AA AK ARAAE AA AA AA AAA A 4AAAAAAASAAAACARAAAGCAAAAE AA AA AR ARE AR AAE AAA AG AR AA AR AAEAA AA AA AA AKAAEAA AA AA AA AA SAAAAAAAREARAAAEAAAAEAAAAEKAAAAREAAAAEAAARAEAAAAEAAAAEAAAAEAARAAEAARAARK ARA AE AAA AE AAARAAM Dicmckocy C SSCOUTSOATA Filmhamm 4 NETHYL OAT Figure 11 24 Statistics and limits for the prediction interval Rebus DP od c io Imto va mm dil lC u dasi m9 l9 d IA IIO te 2 UL LC uJ ae Ads Bates d ara e F v Y t T A L 3 2 02 Z21 7 z 2 3l amp r waictiar Figure 11 25 Robust prediction interval for 4 METH YL DAT Scout Toturial 11 22 Chapter 11 Tutorial
7. Display Graphs For 25 2 2szkc dk RE RARE REX RRXG Q Q Plot Indiv Raw Data statistics OptiOS 42 eese da step nde items emm mA ames deese Classical Zero Lower Lit s iris enanat a MEE ae Gand laren e eG ES eS Cat ak tek vineis No BIMES det oe ood we gau PARS RS DA Nude NR a ER B ANUS Se Two Sided X Axis Valiables be ete i peat eS Mb i ve La iS e Menu Red oae prd RU Ga 2 X ARIS Variable areas Groh eters para aE E TA aaia EE RE KeRb MAM hea EM RE RE 3 Hn P a E e a eines A e a A Robust Analysis PRIS Titlene sa o edea e a taa VI adea a a IS tS aaa a a a a AS Numbeting peera oes A L ea E ee eee s Observations Contour Ellipse sso Rea cok eenaa E Pe Roe eam ence e oes Indiv amp Simul Erase Output File View Weights amp Generalized Distances 0 000000 eee IRIS WTS Generate Graph With Current Options Each of these headings has various choices which can be selected by repeated use of the ENTER key After a selection is made an arrow key can be used to move the cursor to the next heading The process is repeated until the desired choices for all of the headings have been selected For Robust Analysis the various choices for each of the headings are listed in a fourth window The Display Graphs For heading offers the following list of available graphs Q Q Plot Indiv Raw Data Q Q Plot Indiv Standardized Q Q Plot Simul Raw Data Q Q Plot Simul Standardized Scatter Plot Raw Data Q Q Plot PCA Scatter
8. Halip Ma 3249 23 4AAAAAAACAKAAEAAAAEAAAACAAAA Unzc 3 Guida THkaacaaaaaaa Exit e4A4AA4ACAA4ACAA4A4A4 A4 4464444 Iateoductian 444A4AAAEAAAK 444A44A4A AA4A6AA44A46AA444 A4A444 Ia3tallat inn 4AAAAAACAAAACAARAAEGAAAAE AAAAA 4444A44A4 AA44A A4A4A4 AA44A464444 Uamc sa Guida Halls lecceccceeeececeeceaeeneaaaed 44444A4A CAA4ACAAA446AA44A4 A4444 Film Hanaganaat l enanannekAnAXeAnnA eAAA4AEAAA4A4 4A4A44A4EAAA AE AA 4ALAAA4CAA44 Data Hanaganant eanaaAasa Adae 4a natu A A4Euau44a4 444 A44A46AA44EA4 ACA 44464444 Dublimc Taking 4AAAAAACAAAKACAAKAAEAAAAE AAAAA 4444 44A445AAA4EAA AACA A44 E4444 Rosust Analysis PPPPPPTeTPrerrerrrr rrr rrr rrr ye e4A4444AA A4AA4 A4446A444A464444 Peiancigal Congonmsatzs leannnaae nna AAeknnAAeAnAAbEAAA444 s4444A444 AA44ACA444 4444 A4444 l Geagnica 4AAAAAAEAAAAEAAAAEAAAAE AA AAA EA RARRRAEAAAKERRRRER RR RCRA KA Syatan 4AAAAAACAAAACAAAACAAA AC AA AAA 4A444444CA444CA4A4 A4A44 4444 Qguittiaqg 4AAAAAACAAAACAAAACAAAAC AA AAA 4AAAAAAACAAAAEAAAAECAAAAE AAA 4EAAAACAAAACAAAAE AA AA AAAAAA 4AAAAAAACAAAACAAAAEC AA AA AA AA AR AA AA AA AA AA AA AAE AA AA AA AA AR AA AA AA AA AA AA AAA 4AAAAAAACAAAACAAAACAA AA AA AA AR AA AA AA AA AA AA AAE AA AA AR AA AR AA AR AA AA AA amp AA AAA 4AAAAAAACAAAACAAAACAA AA AA AA AR AA AA AA AA ARE AAA AC AA AA AA AA AR AA AR AA AA AA AA AAA 4AAAAAAACAAAACAAAACAAAAS AA AA AR AA AA AA AA AA AA AA AA AA AR AA AR AA AR AA AA AA amp AA AAA Dicmc
9. Press the ENTER key to generate the Q Q plot using the individual setting identify the bottom two and top two data points and your display will match Figure 11 5 The difference between Figures 11 4 and 11 5 is how the control limits horizontal lines are computed The horizontal lines in Figure 11 4 are obtained using the first order Bonferroni inequality as given by equation 12 in chapter 14 whereas the limits in Figure 11 5 are obtained using the probability statement given by equation 13 of chapter 14 Scout Toturial 11 4 Chapter 11 Tutorial Ill aaz Haximum UIL 4 36 S5 Warning UIL 4 14 asx Warning LIL LI a Pe T a n D pas m Lu c T m gt i v b oO a n a 3 v E a aax Maximum LIL T t t 2 84 S 42 a9 Theoretical Quantiles Normal LDiztributian Figure 11 5 Q Q plot for individual raw data for the sp width variable 11 2 Q Q Plots of Principal Component Analysis Q Q plots of the principal component analysis PCA of the IRIS DAT data set will be produced in this section Accordingly select IRIS DAT as the data file The initial action is to establish that your options match those in Figure 11 6 Under Robust Method select Robust Analysis and then select Statistical Options If your options do not match those in Figure 11 6 use the ENTER key repeatedly if necessary to change the options to one of the other preset choices When n
10. enaaaaana ananas Tua ing Constant lenaanaana saaaaana2n l Conteal Chact Limits 4AAAAAAAA sa4aana2n l Teinniag Paccmat PPPTETPPT a4aa444 Igascm Population herarrrrsa saenneed l Pint igaocmd Population lecaceeaee 4AAAKAARAR 4AAAAAAAKR saaaana2n l Accagt Naw Settings lecuccecee Pr ip E B a ry Dicmctacy C SSCDUTSDATA Fi lanana STACKLSZ DAT Figure 11 26 Statistical options for an index plot using Huber influence This data set consists of 21 observations with four variables Several outliers are present in this data set In order to unmask these outliers a higher value of right tail cutoff must be used 20 15 The Huber procedure cannot unmask these multiple outliers even with an of 0 5 Scout Toturial 11 23 Chapter 11 Tutorial Ill Index Plot for Brouwnles s Stack Loss 17 84 as Haximum tLargost HO3s 11 85 asx Warning rIndividual ND 8 17 W po u c m eg in P a n v MH E Fe T Iw v c v e iL Observation Numbers Figure 11 27 Index plot for STACKLSS DAT using Huber influence The second Index plot is generated by exchanging Prop Influence for Huber Influence in Statistics Options Using Prop Influence we increase our ability to unmask multiple outliers Accept the new settings and then generate the graph Figure 11 28 All of the outliers 1 2 3 4 and 21 present in this data set are well separated from the rest of the data
11. minus key on all other checked variables After IRIS DAT has been selected and properly modified while remaining in the Robust Method menu choose Robust Analysis press ENTER and the screen should match Figure 11 1 Film Data Classical Nathod Rosust Nathod PCA Gcagnica Syatan 4AAAAAAACAAAAEAAAAE AAA AE AA AA AA A nnn RRR ERRAR RCRA RRR A4 AAAAAEAAAAEAAAACAAAAEAAA4EA4 A444 Sulmct Vaciaaima 4AAAAAAEAAAAEAAAAA 44444A4446A4A446A44464 44464 44464444444 l Uniwaciakm Statistica leccueecetecceccaca 4444444A46444A464444 4A4A446444A464444444 Rosust Analyaia leccateceeececeaces CRAALALLERRLRERRRRERRRRERRALERRERALALN Contusion Nateix 4AAAAAAECAAAAEAAAAAK 444A44444644446A44A464A4446444464444444 l Pattacn Racogartian leccnteceeeecceadcs 4A4AAAAACAAA4 amp CAAA4CAAAACAAA4A4 4444444 0 Teand PEPE EP ETES ESTES ET YS RRL A M Ruumusab Analysis eM dE ER aaaaa44 l OQiagtey Geagna Fac J 9 Pilot Lindiw Ren Ostaj Jeccntecce aa 4a444 l Statistica Ogtians Classical lecentecae aa4aaA4 l taco Lowme Limit eaaaaanann aa44aA44A4 l Limit Styla Tao Sidad lecentecae aaaaaAa X Axis Vaciaanim herrarnas aaaaaa2 Y Axia Vaciaanim lennaanaaa aa4a444 Titis Rosust Analysis leaaaaanaaa aa42444 X Axis Titis 4A4AAAAAAA aa4a444 Hunamciag Daamcwabinn2 leaaaaanaaa a444444 ConbknucEkll igam Indiw Simul lenaaanaan 4AAAAAAA 4AAAAAAAAR aa44444 Ecass Outgut Film leccetenae aa44444 Yis WSaigqnts Ganwcaliz
12. 11 10 Chapter 11 Tutorial Ill Film Data Classical Nathod Rosust Haxthaod PCA Gcagnica Sysatan 4AAAAAAACAAAAEAAAAE AAA AE AA A AG A A AA AA m AAAAQAAAAGAAAAAA 4444444464 4AA46 AAA4 A4AA46AA4A44 A 444444 Zmimct Vaciaasima 4AAAAAACAAAAEAAAAA 4A444A444A6A4AA46AA44 A4AA46AA4A46 AA4 44444 Uniwaciabm Statistica leccctenetecccecced 4444444464A4A446AA444 A444 A4A46A4A44444 Rosust Analysis 4AAAAAACAAAACAAAAAR 4A444A4446A4A46EAA4AEA4AAA AAA4 A4 44444 Contusion Mabcix 4AAAAAAE AAKAAEAAKAAA 4A4ARAAAEAAAACAA4AEAAAAtAAA4 A4A44444 Pattaecn Racogairtioan leccctceetceececaces LARRARAAERAARCAAAAEALAACARRALARLALLA 0 Teand leccnceceteeceedaee 4A4AAAAACAAAACAAAACAAAA AAA44 4A444444 Add Naana 4AAAAAACAAAACAAAAA TETTETETT M a a s44AAAAAnR saaaanan Sbabiskica Dgtiona Classical lecccaaaaa aaaaaaa l Nunamciag Pugulatinazs leaanaaakna anaaaana Contouc Elligam Indiwidual PPP PTT 4A4AAAAAA 4AAAAAAAAR aaaaaaa l Typa of Geagn PCA Zcncma lecettcece aa4aa44 Gcagn Titia Pattaecn Racogoitian Jececteace a44a444 Sawa Uiaccininant 5cocma 4AAAAAAAA aaaaaaan 4AAAAAAAA aaaaada l Vian Eigqan Valuma and Vactoca leceeeeece caceaaae Viu Contuaion Nate rx 444444444 aaanaaa l Vime Caowaciancm Mabcix and Naana leaanaaaaa RRRA 4AAAAAAAAR anaaaaa l Bagin Congutatiana with Cuccmat Dot iona leceeecece td 4AAAAAAA 4A AAAAAA Diecmckacy C SSCDUTSDATA
13. 5 10 Chapter 5 Robust Statistical Methods Typecof Graphs e nk sie Exe Rate ES RAR sed EM RAL SOMES Discriminant Scores Graph Title us cuero tet te gigantea gs quat edes qm AD Pattern Recognition Save Discrimitant Scores oca o Code aeu olv EE Mab ei Ga oa DG cedo hg No View Eigenvalues and Vectors 2 46223 ua bod bia EX phe eee ede ES Yes View Confusion Matrix zelum o RUE ds eme ss Semone de eame Yes View Covariance Matrix and Means 340s oii oi peewee ee ee es E RES Yes Each of these headings has various choices which can be selected by repeated use of the ENTER key After a selection is made an arrow key can be used to move the cursor to the next heading The process can be repeated until each of the desired choices for the various headings have been selected Statistics Options presents the same menu as described in Section 5 3 Set these options as desired then return to the third window as shown above The remaining headings and corresponding choices in the third window are as follows Headings Choices Numbering Observations Populations Contour Ellipse Individual Simultaneous Indiv amp Simul Indiv Class Simul Class Type of Graphs Discriminant Score PCA Score X Y Graph Title Can be typed in after using the lt ENTER gt key Save Discriminant Scores Yes No View Eigenvalues and Eigenvectors Yes No View Confusion Matrix Yes No View Covariance Matrix and Means Yes No The Graph titles can be typed in after using
14. 848 x2 0 78 1 39 0 91 0 05 0 25 1 311 x3 1 37 0 91 13 33 0 3 0 64 56 716 xA 0 17 0 05 0 3 0 03 0 06 1 583 Octn 4 02 0 25 0 64 0 06 0 8 91 549 Robust Statistics After Deletion of 8 Outliers Covariance Matrix Mean Vector xl x2 x3 x4 Octn xl 44 35 0 83 7 27 0 24 3 95 62 657 x2 0 83 1 24 0 91 0 06 0 25 1 294 x3 727 0 91 12 88 0 35 0 63 56 833 x4 0 24 0 06 0 35 0 03 0 06 1 590 Octn 3 95 0 25 0 63 0 06 0 79 91 568 Scout User s Guide 14 27 Chapter 14 Statistical Procedures 14 9 Interval Estimation Computation of several classical and robust interval estimates useful in many applications are incorporated in the robust module of Scout A good description of these procedures is given in Hahn and Meeker 1991 The following four interval estimates are available in Scout which can be obtained using one of the robust HUBER PROP and MVT or classical procedures l Confidence interval for the population mean p 2 Prediction interval for a single future observation x 3 Simultaneous confidence interval for all of the sample observations x X3 X 4 Confidence interval for a single observation x in a sample These intervals are significantly different from each other and care must be exercised to use them appropriately For example at a polluted site one of the objectives is to obtain a threshold value estimating the background level contamination prior to any activity that polluted the site H
15. AA AA AA AA AA AA Oremetocy C SSCDUT s5 DATA Filmahamm IRIZ U0AT Figure 11 31 The kurtosis value for IRIS DAT Note The classical kurtosis as given in chapter 10 is 25 49 which got distorted by outliers Scout Toturial 11 27 Chapter 11 Tutorial Ill 11 8 Summary ASSESSING NORMALITY AND THE IDENTIFICATION OF OUTLIERS 11 1 Q Q plots While covering the production of these plots we also covered 1 a graphics option lt SHIFT gt 2 options for graphics output lt P gt and lt F gt and 3 the use of lt gt and lt gt to select and deselect variables 11 2 Q Q plots of PCA While describing the production of these plots we also covered 1 using the lt ENTER gt key in a menu to change preset choices and highlighting and typing in values for numerical fields and 2 to the use of Page Down or Page Up to display other graphics when multiple plots are present DATA REDUCTION TECHNIQUES AND EXAMINING DATA FOR PATTERNS 11 3 PCA scatterplots In addition to describing the production of this output we also described 1 the use of lt N gt to identify data points and the use of lt E gt to draw ellipses 2 supplying titles for graphical output 3 use of the X Y Coordinates Scale Factor to rescale graphs to get all output on the screen 4 viewing the eigen values and eigen vectors as part of analysis output 5 examining discriminant analysis along with the confusion matrix and 6
16. AA AA AA AAA AC AA AA AA AA AA AA AA 4AAAAAAACAAAACAAAAGCAAAAG AA AAE AA AA AA AA AA AA AA AA AA AAC AA AA AA AAG AA AAKG AA A A amp K AK AA A4 4AAAAAAACAAAACAAAAGAAAAGAAAAE AA AA AA AA AA AA AA AA AA AACAA AA AA AAKG AAA AG AK AA AA AAA 4AAAAAAACAAAACAAAAGAAAAGAAAAE AK AA AK AA AA AA AA AA AA A AC AA AA AA AAKGAA A AG AK AA AA AAA Diecmckacy C SSCDUT9 sx M DATA Filmahamnmm IRIS DAT Figure 12 5 An explanation window for the Transform Data function indicating completion of the transformation Press lt ESC gt three times to return to the main menu select PCA and press ENTER Move the cursor to highlight Display Matrices and press ENTER to generate the variance covariance matrix for the transformed variables i e the principal components as shown in the Figure 12 6 Scout Tutorial 12 5 Chapter 12 Tutorial Fila Data Classical Hamthod Ruauask Natnod PCA Gcagnica Syatan SAAAAAAACAAAACAAAACAAAACAAAAE AAA AG AA AA AA A AG A A A A A A A A nde LEER ERY 444A4A4AAEAAAACAAARAEAA AK GARA A amp CAA AACAA A 46A A4 ACA A4 4464444 Suimct Vaciaasnima leaananas AAAAAAAEAAAAEAA AACEAA AA LAA AA EAR A 4E A4 AACA AA 46A AA 46 4444 OQiagtiay Hatcicma eaaaaaa 4AAARKAARAGAAAREAAAAGAAAAEAA AA AKA ACAAA ACA 4X4 AE A A44 64444 Eidgmowalumsa 4AAAARAK 44A4A4AAAEAAAAEAAAREAA AK EAA A ALAA 4 ACA AA AE AA A ACA A4 46A 44 vian Conaonmsata leceeeee CAARLARLEAALAEARLALRAAALRAAREARRLERRRALARAALLARRCARLL Teanatocnm Data Jecc
17. AA AA AK AA AA AA AA AA AAA AAA 4AAAAAAACAAAACAAAACAAAAGAAAAG AA AA AA AA AK AA ARA AA AR AA AA AA AA AA AAA AG AA A AG AA AA A A 4AAAAAAACAAAACAAAACAAAAG AA AA AA AA AA AA AR AA AA AA AA AA AA AA AA AAKGAAA AG AA AA AA AAA A LL quM pBmxad AZCOO O Fam s EEEE eaanaanan Raada a data amt fcon an ASCII Eila on any diah Tha tila Jecccccace aanaana tocnmet dat inad io thew Usec s Guicdm 3 GED EAS conagabiaim beeaanaaaa danasaaa CAUTION Oata i manocy will om loat lI eaaaaaaaa P ua xzG tC 4AAAAAAACAAAACAAAACAAAAGAAAAG AA AA AA AA AR AA AA AA AA AA AA AA AA AAGCAAA AG AA AA AA AA AA SAAAAAAACAAAAGAAAAE AA AAE AA AAE AA AA AA AACAAAAGCAAAAG AAA AG AA AA AA AA AA AA AA AA AA AAA 4AAAAAAACAAAACAAAAGCAAAAGAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AA AAGAA AA AA AA AA AAA A SAAAAAAAGCAAAAE AK AA AR AAE AA AAE AA AA AA AACAAAAGAAAAG AA AA AAA AG AK AA AK AA AA AA AA A4 AA SAAAAAAAGCAAAAEAAAAEAAAACAAAACAAAAGAAAAG AAA AE AAA AE AK AA AR AA AA AA AA A amp K amp AA AA AA AAA Oremetocy C SSCDUTSDATA Fi lanana FULLIRIS DAT Figure 2 1 Scout s main menu with the File heading selected displaying six headings and choices for file management and an explanation window for the first heading 2 20 Reading Spreadsheet Files Scout cannot read Spreadsheet data directly However a spreadsheet file can easily be converted into Scout data set In order to convert a spreadsheet data file to a Scout data file the specific file for
18. AAAAAAAKEAKAAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAA AE AKA AE AAA AE AAA AAA SAAAAARAAKEAAAAREARAAAEAAAAEAAAAEARAARERAAAAEAAARAEAARAAEAARAAREAAAAEAAARAE AAA AE AAA AEAAAAAR SAAAAARAAREAAAARERAAAKAEARARAAEAARAAEARAARERAAAEAAARAEAAAAEAARAAEAKAAAEAARAEAAAAEAKRAAEAAAAAK CARA AAA qM Gcagn Pacanmimca MA 4 AA AERA saananas Allows tha uamc to modify tha coloc and shapa of V easasaaaa PPPE iodiwidual osamewations Dsamsewations can 3m canovad PISTES PPPE lmaiddano tecon tha gcagpha ay bucning bhan alach Deccccccaa ShAAAKBARS AEAAAAAA SAAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AA AAE AAA AE AAAAAA AAAAARAAEAAAAERAAAEAAAAEAARAAEAARAAEAAAAEAAKAAEAAAAEAARARAEAAAAEAAA AE AAA AE AAA AE AAA AAA AAARAARAAKEAAAAERAAAAEAARAAEAARAAEAARAAREAAAAEAARAAEAAAAEAARAARERAAAAEAARA AE AKA AE AAA AE AAAAAAR SAAARARAAREAAARERAAAEAARAEAARAAEARARERAAAAERAARAAEAAAAEARARAREARAAAERARAAEAAAAEAARAREAAAAARK 4SAAAAAAACAAAAGCAAAAEGAAAAEAAAAEAAAAEAAAAGCAAAAGCAAAAE AA ARE AAAAEAAAAEAAA AG AAA AE AA AAA Oremectocy C SSCDUT 952 S DATA Filmhanm IRIS OAT Figure 13 1 The Graphics menu with the explanation window for Graph Parameters displayed The Graphics module always considers all the variables in a data set Move the cursor to highlight 2 Dimensional and press lt ENTER gt The screen will be similar to Figure 13 2 All variables in the data set are displayed across each axis in this matrix The upper left to lower right d
19. AAK amp AA A ACAAAAAA LARAAALAEAAALLLAN Ganawcalizad Oiatenca leaanakse eA 4464 44ACA A4 ACAAAREAARAEAAAAAA 444 4444444444444 Multiwaciabm Kuckaosia eaaaanaen aa 4e An 4AbEAAA4A EAA4A4 A4AA4AA4A4A4 a444444a444a4444 Causal Vaciaaima SAAAAAAEAAAAEAAAAECAAKAAE AKA AE AA A AE AA A AAA eanaaeaasAanaa4aa2A l ssucciabmd Caus23 4AAKAAAEAAAKACAAAACAAAAK AA AK amp AA A ACAAAAAA ausa u4444e4Aa44444 Rz nowms Dutliae Flaga eanaaaasansae4a4AE44 A4EA44A4E4AAACAAR AAA 4444444 4 A44 4 4 4 P M 4 44444644 44 A44A AAAA AAAA AAAACAAAAAA 4MAAAAAAACAAAACAAAACAA AA AA AA AA AA AA AA AA AAE AA ARE AA AA CAR AA AR AA AA AA AA AA AA AAA 4AAAAAAACAAAACAAAAECAA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AR AA AR AA AA AA AA AAA 4AAAAAAACAAAACAAAACAAAAC AA AA AR AA AA AA AA AA AA A AE AA AA AA AA CAR AA AR AA AK AA amp AA AAA 4 4AAAAAAACAAAACAAAACAAAAEAAAAG ARA AA AR AA AA AA AA ARE AA AA AA AA AR AA AK AA AA A AE AA AAA A 4AAAAAAACAAAACAAAAECAA AA AA AA AA AA AA AA AA AAE AA AAE AA AA AA AA AR AA AR AA AA AA AA AAA taaa aa aa qM Gacmcaliimd iabanca Mac 4 A4 sanaasaal Tha twat statistic ta bha lacrqmat gaomcalizad diastancs V eeaaaanaas aananaa Tha twat i3 ibEmcabmd until no Fuckthmc cemcocda acm cnimcimd PTEPSTEASPS anaanna Use whan I ia unlikely that many oubliaes acm 2cmamat beenaanaaa LL M ee A AAA SAAAAAAACAAAAEAAAACAAAAGAAKAAG AR AA AA A AK AK AA AK AA ARA AA AA A AK AK AA AK A
20. Garner Fitzgerald Kirk and Nocerino J 1993 Simultaneous Acceptance Regions and An Alternative Statistical Scoring Algorithm to Assess the Performance of the Laboratories Participating in the CLP Program of the USEPA An Internal Report Wilks S S 1963 Multivariate Statistical outliers Sankhya 25 407 426 Scout User s Guide 14 42
21. Ill You can save this output by pressing lt F gt and supplying the name of a file to hold the graph or by pressing lt P gt to print the graph 11 5 Index Plots Select STACKLSS DAT from the Data subdirectory of the Scout directory Return to Robust Method Robust Analysis and within the Select Graph Type menu select Index Plots Set Statistics Options as shown in Figure 11 26 using Huber Influence to detect outliers Accept the new settings and then generate the graph Figure 11 27 Film Data Classical Nataod Robust Nathod PCA Geagnica Syatan 4AAAAAAACAAAAEAAAAEAAAACAA A AE AA A A LL a n RRR RRRA ERRAR An 444444AA6AA4 AEAAA46AA 4464444 4 A4A44444 Zm imct Vaciaaima PPEPPTSYST P 4444444AC AA4AE A4A46AA 4464444 46A 444444 Uniwactatm Statistics ececceccecccedcana 44A4AAAAEAKAAEAAAACAAAA EAAAACAAAAAA4 Rosust Analysis lecccccccerccedcada ALAA AAALRARAERRAAERARAERARREARRAALA Contusion NHabcix 4AAAAAAEKAAAAEAAAAAR 444444 4ACAA4AEA4A46AA A46 AAA 4 A444444 Pattaecn Racogaitian lecccceccerceeccada 44444444CAAAACAAA amp 46A4AACA4 44 4444444 0 Tcmad 4AAAAAACAAAAEAAAAA REEL ms Skatisbical Dgb inn mm eK KKK saeaceaal Conguts Statistica Using Husar Int luacnce lecacccces taeaaeec l Initial Eatinata Rosuat lecaceecen aaaaaa l Matcix 4AAAAAAAAR a4aa4A4 l umidgnta leaaaanaana aaaaaaa l saaaanana sa4aa4424 xX Y Cuucdinabma 5calm Factor Lal 4AAAARAAAA s44a4444 Right Tail Cutatt
22. Robust Regression amp Outlier Detection John Wiley New York Rousseeuw P J and van Zomeren B C 1990 Unmasking multivariate outliers and leverage points Journal of American Statistical Association 85 633 639 Schwager S J and Margolin B H 1982 Detection of multivariate normal outliers Ann Statist 10 943 954 Scout A Data Analysis Program Technology Support Project U S EPA EMSL LV Las Vegas NV 89193 3478 Stapanian M A Garner F C Fitzgerald K E Flatman G T and Englund E J 1991 Properties of two tests for outliers in multivariate data Commun Statist Sim 20 667 687 Singh A and Nocerino J M 1993 Robust QA QC for Environmental Applications Proceedings of the Ninth International Conference on Systems Engineering Las Vegas Nevada 370 374 Singh A 1993 Omnibus robust procedures for assessment of multivariate normality and detection Scout User s Guide 14 41 Chapter 14 Statistical Procedures of multivariate outliers Multivariate Environmental Statistics Patil G P and Rao C R Editors Elsevier Science Publishers Amsterdam 445 488 Singh A and Nocerino J M 1995 Robust Procedures for the identification of multiple outliers in Handbook of Environmental Chemistry Vol 2 G Springer Verlag in press Singh A Singh A K and Flatman G T 1994 Estimation of background levels of contaminants Math Geol 26 361 388 Singh A F C
23. S Britton P W and Lewis D F 1988 On the Prediction of a Single Future Observation from a Possibly Noisy Sample The Statistician 37 165 172 Huber P J 1981 Robust Statistics John Wiley New York Iglewicz B 1983 Robust Scale Estimators and Confidence Intervals for Location in Understanding Robust and Exploratory Data Analysis Hoaglin D C Mosteller F and Tukey Scout User s Guide 14 39 Chapter 14 Statistical Procedures J W eds New York John Wiley Johnson R A and Wichern D W 1988 Applied Multivariate Statistical Analysis Second Edition Prentice Hall New Jersey Jennings L W and Young D M 1988 Extended critical values of multivariate extreme deviate test for detecting a single spurious observation Communication in Statistics Simulation and Computation 17 1359 1373 Kafadar K 1982 A Biweight Approach to the One Sample Problem Journal of the American Statistical Association 77 416 424 Mardia K V 1970 Measures of multivariate skewness and kurtosis in testing normality and robustness studies Biometrika 57 519 530 Mardia K V 1974 Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies Sankhya 36 115 128 Rosner B 1975 On The Detection of Many Outliers Technometrics 17 221 227 Scout User s Guide 14 40 Chapter 14 Statistical Procedures Rousseeuw P J and Leroy A M 1987
24. Scout files together The new data file is always written as an ASCII file The append routine assumes the variables are the same in each of the input files If the two input files do not contain the same number of variables the routine will not allow them to be appended The variable names from the first input file will be used as the variables names in the new file All of the observations from each of the input files are written to the new file even if duplicate record labels occur Scout User s Guide 2 5 Chapter 3 Managing Data in Scout 3 1 Data Management Scout enables the user to edit insert or delete observations and variables currently in memory change the title of the data set and change the name units or other attributes of the variables Select Data from the main menu and Edit Data from the pull down menu as shown in Figure 3 1 below Film Data Clazaa cal Nathod Rosust WNathod PCA Gcagnica Syatan RBBB nC EC KKK KE KKK KERKRKRERKRKRERKRKRERKERERARKRERKERERKERERKRAKERKRAKRKG aaaaaaa Edit Data ennnannne an Anc AAAACAAAACAAAACAAAAEAAAAEAAAAEAAA amp EAA A amp EAA A eaaa aaa Shab BRIS jcxunuAkAX4A4AAAAEAAAAEAAAACAAKAACAAAACAAAAK AA AK amp AK AA AK AACAAAAA e aa AA4 Teanatocn aaanaa tea t4 Aea 44 4e444AEA4AAEA4 AAEAAAACAA AR LARA AA LAA A46 AA A4 A aaaaaaa Peint Data e cA KORA ARERR An4CAAAACAAAACAAAAEAAAAEAAAAEAAAKEARAAEAA AAA thhhhhha SEAAAAECAAAAEAAAAGCAAAACAAAACAA AA AA AA AA AA AA AA AA AA AA AAA A
25. a level 1 heading Additional levels of menus and headings will be found in Scout Their description will be consistent with the definitions described above In this tutorial you will learn a how to read data files b how to use the Statistics choice under the Data heading c how to save the Statistics output obtained by using a Statistics option and d how to work with the various functions under the Transform heading 9 2 Read Data Files In the Scout directory at the prompt C Scout gt type SCOUT and use the ENTER key three times This will guide you to the screen shown in Figure 9 1 Any of the headings can be selected by using the lt RIGHT gt or lt LEFT gt arrow keys Highlight select the File heading press the ENTER key and the level 2 menu will appear The heading Read ASCII File will be highlighted press the lt ENTER gt key again and a directory will appear listing the names of files and other directories To select a different drive just hit the appropriate key A B C etc to represent the appropriate drive The files and directories displayed will depend on the directory content of each individual user The file IRIS DAT should be in the Scout directory Highlight this file and press ENTER the list of files and directories will vanish a small explanation window will appear stating Reading data please wait which may vanish before you can read it and then the Figure 9 1 screen
26. are obtained using either the MLE or one of the robust approaches The univariate simultaneous limits given by equation 10 can be plotted on the single variable normal probability plots Observations falling outside these limits are the univariate outliers 14 7 Contour Plots The contour probability plots of the Mds based on classical or robust estimators of location and scale can be used to further enhance the identification of outliers The contour ellipsoids of the Mds are displayed at the same two levels as the warning point Md and the maximum point Md lines on the Q Q plot of the Mds as described above For given values of and n the critical values Md zi a and Md differ significantly The associated confidence ellipsoids are given by the following statements P Md Md 1 1 2 n 1 a and P Md MdP i 1 2 n 1 0 Outlying observations stick out more clearly on the plots obtained using the robustified Mds Observations falling outside the outer contour are outliers whereas the observations lying between Scout User s Guide 14 20 Chapter 14 Statistical Procedures the inner and the outer contours need further examination and points falling inside the inner contour represent the main stream of data 14 8 Robust Principal Component Analysis Principal component analysis Anderson 1984 Johnson and Wichern 1988 is one of the well recognized data reduction techniques It is well known that while
27. be based solely on their magnitudes Logically one cannot truly distinguish non normality from contamination Discordant values Scout User s Guide 4 Chapter 4 Classical Methods for Outlier Identification should be subjected to increased scrutiny and removal should occur only when this inspection reveals unique or unusual problems in the measurement or recording of these values Scout is designed to enhance the user s ability to quickly identify such problems 4 2 Select Variables When searching for outliers the user should decide which variables are to be included in the analysis The Select Variables heading will allow the user to do this If the user skips this step Scout will default to testing all of the variables Once in the variable selection screen a check mark next to a variable name indicates that variable will be tested The user may place or remove these check marks by using the lt UP ARROW and DOWN ARROWS keys to move the selector to a particular variable name and then pressing the lt gt key to remove the check mark and the lt gt key to place a check mark The lt gt and lt gt keys move the selector to the next variable name so that a series of variables can easily be set by holding down one of these keys Pressing lt ENTER gt or lt ESC gt will accept the variable selection as indicated 4 3 The Classical Outlier Tests The two outlier tests available in the Classical Method menu are Mardia s multi
28. be performed for each of the regions in the Pb add file 7 here 5 9 Causal Variables When Causal Variables is selected the second window will display the message Searches for the variables that might have caused a given observation to be an outlier A variable is a cause if when removed the observation is no longer an outlier When the ENTER key is pressed the third window appears allowing the various headings to be set The available Scout User s Guide 5 13 Chapter 5 Robust Statistical Methods headings for this choice are as follows Scout User s Guide 5 14 Chapter 5 Robust Statistical Methods Headings Example Choices Statistics OpHORS cov ve c ee lv de cU Dd d eae oa Classical Confidence Interval 2 0 eee Simultaneous Zero Lower Limit 222 5 x d be ee hee ee be hee ee he a es No Each of these headings has various choices any of the choices for Confidence Interval and for Zero Lower Limit can be selected by repeated use of the ENTER key After a selection is made an arrow key can be used to move the cursor to the next heading The Zero Lower Limit option can be used when the lower limit becomes negative and the data cannot take negative values Statistics Options presents the same menu as described in Section 5 3 Set these headings as desired and return to the third window The remaining headings and corresponding choices in the third window are as follows Headings Choices Confidence Interval
29. benefit from reviewing the tutorial sections before reading the user s guide Various examples presented in the tutorial section are produced by using some well known data sets The main menu in Scout contains seven headings These headings are labeled as File Data Classical Method Robust Method PCA Graphics and System Each of these headings has various options These options can be viewed by moving the cursor in the main menu to the appropriate area and pressing the ENTER button A short description associated with each heading or choice is displayed automatically in the window of the main Menu The description window associated with any heading or choice can be activated by moving the cursor or by using the lt ARROWS3 key to the corresponding area The User s guide section and the tutorial section of the manual are organized systematically from the File heading to the System heading Scout User s Guide 1 2 Chapter 1 Preliminaries 1 3 Installing Scout Place the Scout diskette in drive A or B and install to hard disk C 1 Type C without quotes and press ENTER This changes the current disk drive to drive C 2 Type MD SCOUT and press ENTER This creates a directory called SCOUT where the program will reside 3 Place the Scout disk in drive A or drive B and close the drive door 4 Type COPY A CASCOUT and press ENTER This copies all the files from the program disk in drive A into the SCOUT
30. computers the exact critical values based on a scaled beta distribution can be obtained quite easily Using Scout the critical values of the distances Mds and the theoretical quantiles used along horizontal axis in the Q Q plot of the Mds can be obtained using one of the following two options The Chi square Approximation e The scaled beta distribution The default option is the scaled beta distribution The Right Tail Probability And The Confidence Coefficient Scout allows the user to select a value for the right tail area gt 0 01 for the distribution of individual Mds default 0 05 Also for all of the control limits in Q Q plots index plot and interval estimates the user can pick a confidence coefficient of his or her choice for example 80 90 95 99 etc warning and maximum limits The default confidence coefficient is 0 95 Two Choices For The Scale Estimator Scout User s Guide 14 9 Chapter 14 Statistical Procedures For multivariate data sets the user can obtain the relevant statistics such as the Mds the PCs etc either using the variance covariance matrix or the correlation matrix The correlation matrix is chosen by default Tuning Constant and Trimming Fraction The PROP procedure does require the use of a tuning constant An option for selection of a tuning constant is provided in Scout for interested users The default value is 1 0 Also the trimming fraction representing the percent of observat
31. contains three headings as shown in Figure 13 1 Graph Parameters is used to select the color and shape of data points used in a graph After selection of a data set and the optional selection of desired colors and shapes of data points a 2 dimensional or 3 dimensional graph can be displayed The 3 dimensional capability of Scout affords opportunities to view the data from many perspectives For this tutorial select the FULLIRIS DAT data set from Scout s Data directory Film Data Classical Hamthod Rosust Wathod PCA Gcagnica Systan 4AAAAAAACAAAACAAAAEAA AA AA AA AA AA AA AA AR AAGC AAA AG AA A A amp A A A A A g 3 4AAAAAAAEAAAACARARCAR ARCAR A amp KEAR ARERR AR CARA RERRRRERRRRERRR REA Gcagn Pacanmimca 4AAAAAAACAAAAEAA AAA A AACAAAAEAA BRE RRR MERA BER 444 AK A46 444444 2 Oinanatranal 4AAAAAAACAAAAEAR ARCA A AACARAREAA A4 AAAKEAR A amp EA 444 AA A46 444444 5 immoainna AARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AEAAAAE AAA AE AAAAEAAAAE AAA AEAAAAARKR SAAARAARAREAAARERAAAEAARAAEAARAAEARAARERAAAEAARAAEAARAAEARAAAKEAAA AERA AAARARAAAEAKAARAREARAAAEAARAEAARAAEARARAARERAAAAEARARAAEAARAAEARARAEAAAAERARAA AE AAA AE AA AAEAAAAAR SAAARAARAAREARAAARERAAAERARAEAARAAEARAARERAAAERAARAEAAAAEARAAREAAARAERARAAEAAAAEARAAEAAAAAK SAAAAARAAREAARARERAAAEARARAAEAAAAEARARERAAARERARAEAAAAEARAAREARAAARERARAEAAAAEARAAEAAAAARK AARAARAAKAERARARAAEAARAAERARAAAEAAAAEAAAAEAARAAEAARAAEARAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAK
32. examples some of which are discussed in the tutorial chapters of this user s guide The readers are encouraged to try the procedures described here on data sets from their own applications Some desirable properties of an outlier identification procedure are The procedure should be resistant to swamping and masking effects with a have high breakdown point The procedure should be graphical and intuitively appealing to the user There is no substitute for a good and revealing graphical display of the data set The resulting robust and resistant estimates of location and scale and the Mds with or without the outliers should also be in close agreement with the corresponding MLE estimates and the Mds obtained after the removal of the outlying observations Scout User s Guide 14 5 Chapter 14 Statistical Procedures The procedure should be able to order the Mds accurately leading to the correct identification of outliers 14 2 General Description of Statistical Procedures in the Scout Software Package All of the major menus available in Scout have been discussed in earlier chapters Some statistical procedures used in Scout are listed as follows l Histogram and Data Transformation Several transformations are available including standardization linear and logarithmic transformations power transformation e g square root Box Cox type transformations These have been discussed in earlier chapters Normality Tests And
33. identification of outliers and the estimation of population parameters of location and scale typically use an influence function The robust module of Scout computes various statistics using four methods These include the classical MLE approach the robust multivariate trimming Scout User s Guide 5 1 Chapter 5 Robust Statistical Methods approach Devlin et al 1981 the Huber influence function Huber 1981 and the proposed PROP influence function Singh 1993 Numerous graphical procedures are incorporated in Scout These include the normal Q Q plots of raw data scatter plots Q Q plots and scatter plots of principal components Q Q plot and index plot of the Mahalanobis distances scatter plots of discriminant scores contour plots plots of prediction interval simultaneous confidence intervals and more The control chart type quantile quantile Q Q graphical display of multivariate data combines the effect of a formal test procedure and an informal graphical display into one powerful multiple outlier identification procedure 5 2 Choices of robust analyses Several univariate and multivariate robust procedures are available in Scout which are worked out in detail in the tutorials Section II There are nine options in the Robust Method menu Select Variables Univariate Statistics Robust Analysis Confusion Matrix Pattern Recognition D Trend Add Mean Causal Variables Print Destinations There are various screens associated
34. in an environmental monitoring application it is quite possible that the classification procedure based upon the distorted estimates may classify a contaminated sample as coming from the clean population and a clean sample as coming from the contaminated part of the site This may Scout User s Guide 14 3 Chapter 14 Statistical Procedures lead to incorrect remediation decisions The MLEs based classical and even the robust outlier identification procedures are vulnerable to masking and swamping effects in the presence of multiple outliers Masking means that the outliers are hidden and the presence of some outliers may mask the existence of others Even the sequential use of the outlier identification procedures can not help unmask these multiple outliers e g see Example 1 Chapter 10 When the outliers arise in clusters the OLS regression model gets attracted toward the outliers resulting in deflated residuals leading to masking of outliers Swamping on the other hand means that some of the inlying observations are identified as outliers due to the presence of some other outliers In the presence of multiple outliers or for a mixture sample from two or more populations the generalized distances including robustified Mds get distorted to such an extent that the cases with large Mds may not correspond to the outlying observations This data masking distorts the estimates of the population parameters e g u and the correct ordering o
35. l Geagn Titia lecceteeca PEPEE Savm Oraccininant S5cocma leccateace ean n ta vian Eigan Valuma and Vactocs 4AAAAKAARAR eAAAAAAK AAAAAAK Vim Contusion Habcix eaaaaanAua vVime Covaciancm Nateix and Naana lectcecece RRRA 4AAAAAAA 4AAAAAAA 4AAAAAAAAR saaaanaan l Gagin Conmgutatiaona with Cuccant Dot iona leeaaaaasa 44444444 I M MMMM M 6 A amp 44 4 ce OQiemctocy C SSCDUTSDATA Fi lanana FULLIRIS OAT Figure 11 15 The pattern recognition menu with Type of Graph set to Discriminant Scores The Eigen Values and Eigen Vectors associated with this analysis will first appear as shown in Figure 11 16 After examination of these values press lt ESC gt and the confusion error matrix will be displayed as shown in Figure 11 17 Scout Toturial 11 15 Chapter 11 Tutorial Ill Film Data Clesasical Hxthod Rosust MHxthod PCA Gcagnica Syatan SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AA r AA ASEAAAARKAAAAREAAAAAAK n a a F lt E W lt W n m nu Eigao Valuma I 52 19720 2 0 2654 Ergun Wactoca I 0 6294 1 5545 2 2012 2 6105 2 n u2z4 2 1545 0 75317 2 5592 Pcmas CP to 2cinb ac CESC gt D to axit Oremetocy C SSCDUTSUATA Fi lanana FULLIRIS OAT Figure 11 16 The Eigen values and Eigen vectors associated with Fisher s discriminant analysis of FULLIRIS DAT Film Data Classical Hmthod Rosust Natihod PCA Gcagnica Sysatan SAA
36. may choose it to be something else i e New Select your choice with the ARROW keys and the ENTER key or press the key corresponding to the first letter of your choice If your choice is not New Scout will automatically insert the correct values for each variable in this observation and the label will read Arithmetic Geometric or Median If however your choice is New Scout will enter a value of 1E31 for each variable and Obs n for the label where n the observation number You must enter the correct values and label manually if you select New Simply move about the screen with the ARROW keys until you find the value or label you wish to change type the correct value or label and press ENTER Scout User s Guide 3 2 Chapter 3 Managing Data in Scout SUGGESTIONS 1 It is recommended that means medians or any other summary statistics be inserted as either the first or last observation 2 Scout allows insertion of only one observation at a time If you wish to insert many observations with additional data it may be more time effective to exit Scout and insert the new data under a different software e g a spreadsheet Inserting Variables This option allows the user to insert variables 1 e columns to the data set Move about the spreadsheet screen with the lt ARROW gt keys until you find the column in which you wish to insert a variable Press the lt INSERT gt key You will then be given a choi
37. of the data The second line of the file must contain the number of variables This number p must be an integer greater than or equal to one and less than or equal to 22 The next p lines contain the variable names in the first 10 columns 1 10 and the associated units in the next ten columns 11 20 Data formats in FORTRAN notation can be included after the units in columns 21 30 Finally a comment for each variable may be included in columns 31 80 After line p 2 the remaining lines contain the data so that each line represents one observation Numbers must be separated by spaces commas must not be used Missing values are designated by 1E31 An observation identifier may be placed at the end of each line This identifier or label can be up to ten characters long and must be in quotes The following is an example of a file in Scout format Geostatistical Environmental Data 3 Easting feet F7 1 Northing feet F7 1 Arsenic ppm Gl6 9 Cadmium ppm F10 3 Lead ppm F10 3 288 0 311 0 850 11 5 18 25 Sample 1 285 6 288 0 630 8 50 30 25 Sample 2 273 6 269 0 1 02 7 00 20 00 Sample 3 280 8 249 0 1 02 10 7 19 25 Sample 4 273 6 231 0 1 01 11 2 151 5 Sample 5 276 0 206 0 1 47 11 6 37 50 Sample 6 285 6 182 0 720 7 20 80 00 Sample 7 288 0 164 0 300 5 70 46 00 Sample 8 292 8 137 0 360 5 20 10 00 Sample 9 278 4 119 0 700 7 20 13 00 Sample 10 To save data in this format select the option Write ASCII Data File S
38. of the variables used in the file 3 the number of missing values for any variable 4 the minimum and maximum values for each variable 5 the mean of values for that variable 6 the standard deviation sd 7 the percent coefficient of variation and 8 the variance Scout Tutorial 9 4 Chapter 9 Should you wish to save this file it can be incorporated in word processing software for example import as ASCII DOS TEXT in WP6 0 press the P key This option brings forth a window asking for a file name Fill in with an appropriate name perhaps linking the statistics to the data file they came from and be sure to specify the path if different from the Tutorial default path indicated in the lower left corner If no name is supplied pressing the lt ENTER gt key will simply print the summary statistics to the local printer 9 4 Transformation of variables The next option in the Data menu is the Transform heading This option can be used to perform variable transformation The two headings within this menu are shown in figure 9 3 1 the Kolmogorov Smirnov goodness of fit and 2 the Anderson Darling normality tests Various transformation functions can be obtained by choosing one of these two tests Choosing the Kolmogorov Smirnov Hogg and Craig 1978 goodness of fit test and pressing ENTER will give a table of variable statistics Choosing the variable you are interested in and pressing the ENTER k
39. sure that the variables in the plots were included in the outlier test Otherwise the plot may include additional outliers 4 4 Causal Variables After an outlier test has been executed the user may wish to identify the variables if any which are responsible for each discordant observation This is done by selecting the Causal Variables choice from the pull down menu Scout will retest each discordant observation with one variable excluded at a time Thus each discordant observation is tested p times using all subsets of p 1 of the variables A variable is listed as causal only if absence of the variable prevents identification of the outlier Although this procedure is based on iterations of rigorous tests of hypotheses the user should consider its results only as general guidance and not as definitive proof of the cause Starting with an investigation of the suspected causal variable or group whose removal results in the largest decrease in the value of the test statistic is recommended As with any quality control technique the results of these statistical procedures should be combined with experience and knowledge of the measurement system for proper interpretation of the data The output is described as follows The Outlier column provides the observation number and label of the discordant observation being tested Test shows the outlier test statistic while Crit gives the critical value used in the test The test statistic and crit
40. the transform you have just selected along with any constants This window keeps a record of all the transforms you have chosen for each variable If a transform does not produce the desired results you may undo that transform by selecting the undo option from the transformation menu 3 4 4 2 Logarithm Transforms the data by using the natural logarithm All of the data must be greater Scout User s Guide 3 8 Chapter 3 Managing Data in Scout than zero in order to use this transformation 3 4 4 3 Power and Box Cox These two transformations will be explained together as they are very similar in usage Both of these require a nonzero constant a After entering a value for a you have the option of adjusting it The value you entered will be displayed along with an incremental value delta Pressing the lt gt key will increment a by delta and immediately reflect the results on the screen Likewise pressing the lt gt key will decrease a by delta and show the results This gives you the ability to quickly try many values of a before you decide which one to select You may also adjust the delta value for larger of smaller increments Press the lt CTRL gt and lt gt keys at the same time to make delta smaller Press the lt CTRL gt and lt gt keys at the same time to make delta larger The range of delta is from 0 001 to 1 0 When you find the desired value for a press the ENTER key to accept it If you cannot fi
41. the mean concentration at each location is constant within the region under consideration This assumption is often violated by the data collected from a polluted site Therefore in order to use OK to characterize the site under study the data with spatial trend need to detrended so that the constant mean assumption is satisfied Scout offers the D Trend heading for removing trend that might be present in a geostatistical data set obtained from a polluted site It assumes that the data is in the same format as for the pattern recognition option with the population IDs in the first column Using an appropriate multivariate technique first the data has to be partitioned into various strata with significantly different statistics e g mean vectors Using the geographic information of the sample observations a site map can be prepared exhibiting the actual sampling locations and the respective population IDs The D trend heading when used subtracts the respective sub population means from each observation in the corresponding sub population The resulting data thus obtained satisfy the constant mean assumption An example is included in the tutorial section illustrating its usage 5 8 Add Means This heading is used after OK has been performed using the detrended data and a file with extension grd has been created The means subtracted using the D Trend option need to be added back to the kriging estimates in the grd file This can be achieved u
42. the region bounded by 1100 1220 1100 1700 3000 1220 and 3000 1700 This will be performed for each of the 7 regions in the Pb add file Scout User s Guide 14 33 Chapter 14 Statistical Procedures 14 11 Outliers in Discriminant and Classification Analysis Discriminant and classification analyses are multivariate techniques concerned with separating distinct groups discriminant analysis of observations and with allocating new observations classification analysis to previously defined groups populations The separatory procedure is rather exploratory In practice the investigator has some knowledge about the nature and the number of groups The study might be about k known groups for example k geographic regions k treatments k analytical methods k species or k laboratories In these cases the investigator knows the origin of each of the objects in a sample of size n obtained from these k populations However some of these k groups may be similar in nature and can be merged together The objective here is to establish g k significantly different groups Let s min g 1 p then s discriminant functions can be computed for these g p dimensional groups Anderson 1984 Johnson and Wichern 1988 These functions are then used in all subsequent classifications However if the investigators have no prior information about the observations and their origin then they have to search for natural groupings of observations unsup
43. their statistics the lusto gram of the chosen variable and the Transformation Menu CAUTION Use of the transform option will produce values that will replace the original data Care in copying the original data to another file prior to use of the transform option will ensure retention of the original data 9 5 Summary 1 The first step in working with Scout is to read in a data file Read ASCII File heading 2 Editing data is a potent Scout capability and is not needed in these tutorials 3 The summary statistics for a data file can be produced easily and the output may be saved to a text file that can be incorporated in word processing software Scout Tutorial 9 6 Chapter 9 Tutorial 4 The transform heading offers the options of two normality tests Transformations can permanently alter data values copying to another file name prior to work is prudent Scout Tutorial 9 7 Chapter 10 Tutorial Il Classical Method The level 2 Classical Method menu contains four headings Select Variables Generalized Distance Multivariate Kurtosis and Associated Causes and two choices Causal Variables and Remove Outlier Flags as shown in Figure 10 1 Remember a data file must be read before any analysis is possible Film Data Classical Nathnod Robust Natnod PCA Gcagnica Sysatan 4AAAAAAAEAAA AE LA USACAAAAEAAAAEGAAAAE AR AA AA AA AA AAEAAAAAA sanas44444444444 Suimct Vaciaaima 4AAAAAAEAAAKACAAKAACAAKAAE AA
44. understanding of their data sets to group General Cause and subgroup Specific Cause variables which according to their specialized knowledge may be causally related The user must specify the groupings that will be sequentially excluded from the outlier test Any group whose exclusion results in the observation no longer being discordant will be listed as potentially causal This is intended to aid the user in finding and correcting physical causes of discordancy Thus the groupings should correspond with known physical causes For example a subset of the variables may have been measured on a single instrument It would be natural to group these variables so that Scout can investigate the possibility that discordancies are manifest in the entire group of variables due perhaps to faulty operation of the instrument Variables may be grouped according to a variety of characteristics The user should also run the Causal Variable routine and interpret the results of the associated causes routine in light of the fact that discordancy in a single variable will cause all groups containing that variable to appear causal 4 6 Remove Outlier Flags The Remove Outlier Flags choice provides the user with a means of unmarking any data that has been identified as an outlier Once a procedure has identified outliers these outliers are colored red in the data file The Remove Outlier Flags choice turns the red data back to white the original color of the da
45. viewing multiple populations with ellipses defining each population FORMAL GRAPHICAL OUTLIER IDENTIFICATION 11 5 Index plots Here we produced index plots using Huber influence and Prop influence The different results highlight the difference between these two methods The Prop method has the ability to unmask multiple outliers that the Huber method did not detect 11 6 Generalized distance This procedure also highlighted the difference between Huber and Prop 11 7 Kurtosis The value for kurtosis was calculated using Generate Graph With Current Options This choice in the Robust Analysis menu is equivalent to an Execute function INTERVAL ESTIMATES 11 4 Control charts In this section we 1 produced simultaneous C I and prediction interval control charts and 2 learned to use Q to display a graph after a tabular output Scout Toturial 11 28 Chapter 12 Tutorial IV Classical Principal Component Analyses The PCA module has five headings as shown in the Figure 12 1 After selection of the data set for PCA analyses and after selection of the desired variables any of the four remaining headings may be selected for data analyses For this tutorial select the data set IRIS DAT Move the cursor to PCA and press lt ENTER gt Use the Select Variables option to assure yourself that the two width and two length variables are checked and that Count is not checked If this is not the case use the plus
46. when Graphic module is highlighted from the Scout s main menu A 2 Dimensional or a 3 Dimensional Graphics can be displayed by using these options If the number of variables in the data set exceeds the number of dimension chosen for the graphic option then various variable combination can be selected for the graphic display The System module provides on line information of various Scout modules Each section of the User s guide can be displayed in the screen by selecting the appropriate section Printer setup can be accomplished by using the Printer Setup option and by setting various parameters for the option Scout Tutorial 13 6 Chapter 14 Statistical Procedures 14 1 Introduction to Statistical Procedures for the Identification of Multiple Outliers Outliers also known as extreme anomalous discordant suspect maverick or influential observations are inevitable in data sets originated from many applications In a manufacturing process outliers typically represent some mechanical disorder of the system unexpected experimental conditions and results raw material of an inferior quality or misrecorded values In biological dose response applications outlying observations may indicate an entirely different type of reaction an unusual response to a newly developed drug In this case outliers may be more informative than the rest of the data In environmental and ecological applications outliers could be indicat
47. 2 information on individual observations from coded symbols is lost Use the lt T gt key to toggle from symbols to pixels and from pixels back to symbols Scout User s Guide 7 5 Chapter 7 Graphics Stop Rotations Restore Original Plot The user can stop all rotations of the graph by pressing the lt SPACEBAR gt The user can also restore the original plot at any time by pressing the HOME key These features can be very helpful when the rotations get out of hand 7 9 Search Observation Mode The user can identify individual observations that make up the graph This feature is called Search Observation Mode and is entered by pressing the S key The user can scroll through the observations with the up and down arrows lt PGUP gt lt PGDN gt lt HOMES gt and END keys The user can also change the color of an observation by pressing the first letter of the desired colors The available colors are Yellow W hite G reen C yan R ed B lack If an observation is changed to black that observation will be removed from the graph and the graph will be rescaled when the user exits search observation mode Likewise a black observation can be put back in the graph by changing its color The lt ESC gt or ENTER keys will return the user to three dimensional rotations 7 10 Quick 2D Graphs The user can have Scout display quick two dimensional graphs of the current three variables The X Y and Z keys are used
48. 2 idth 5 0 0266 waciaaim TE 4T fe 4fa Peogactian Cumnulabivm vVaciaaim Sees Bn r5s4 gt Imagta Loading Peogoction 11 944 Cumnulabivm 55 412 Loading Wacianiz n 5zur gt ltangta Peagoctian E T Cunmulatiwm 7r 062 Loading waciaa im Loading d 0 09 5 5 Loading e 0 4701 Loading Pemaa CP to 2cinb oe CESC gt to mxikt Figure 12 4 Table showing the component loadings for various PCs 12 3 Transform Data IV The last heading in the PCA module is Transform Data This option is used to replace the original variables by principal components To use this option move the cursor to highlight Transform Data and press ENTER The two choices Covariance and Correlation appear as they did for Display Matrices Eigenvalues and View Components For this tutorial session select Covariance and press ENTER At this point the explanation window as shown in the Figure 12 5 will appear on the screen stating 4 variables transformed Scout Tutorial 12 4 Chapter 12 Tutorial IV Film Data Classical Nathod Robust Natinod PCA Gcagnica Syatan 4AAAAAAACAAAAE AA AA AA ARAS AR AAECARAAE AAA AE AA AA AA A A LA A R a KARAK As 4AAAAAAAECAAAAECAAAAEAAAA E AA AA AA AA AA AA AA AACAAAACAAAA SmImcE Vaciaaima leaaaaaa 4AA4AAAAACAAAACAAAAGCAAAAEAAAAEAAAACEAAAAC AAAACAA4AA4 A4A4A44 Dalay MHabcicma leccnaee 4AAAAAAACAAAACAAAAC AA AA AA AA AA AA AA AA
49. 4 X Y Cuucdinabmas Scalm Factor Lal to PE canaaaae Right Tail Cutote 0 025 aaaaaaaa PPPA Tuning Conatant ea naaaana PETT TFTA Conteol Chact Limita o 05 Lea aai ds aa aaax l Tetnnming Paccant leccecccce a aaa 2 l Igaacm Population F leaaaaanna saanaaan l Plot Igoocad Population letcenceee AAAAAAKR SAAAAKAAARAAR eaaa aas l Accagt Naw Sattinga leccccccce SOO on a 4 4 amp 4 amp 4 4 Dicmctacy C SCOUTISSOATA Filmahanm NETHYL OAT Figure 11 20 Statistics options for simultaneous controlcharts Set the other options in the Robust Analysis menu to match those shown in Figure 11 21 Generate the simultaneous control chart for all observations by moving to Generate Graph With Current Options and pressing ENTER Except for the title and the identities of a few data points your display should match Figure 11 22 Scout Toturial 11 19 Chapter 11 Tutorial Ill Film Data Classical Natnod Rosust Natihnod PCA Gcagnica Systan 4AAAAAAACAAAACAAAAEAAAAG AA AA AA A LEELA IM M ARAAACAAAAEAAAAAA 44444AA446A4AALAAAA A4A444A4464444444 Smlact Vaciaaimas eanaaaakAAA44 44444 44444444644 AA0A4AALA4A44444464444444 Uniwactatm Statistics lecceateeceeceeeead 4 4AAAAAAACAAAACAAAAECARAACAAAAEAAA4A4A4A44 ROBUSE Aalysisa PPPEPTTTET PPI TT TS 4AAAAAAALAAAACAAAACAAAAEAAAACAAAA4A444 Cuntuainae MHabcix eaananaennnnAcenaaaa 4A444A4AREAAAACAKAAEAAAACAAA amp E AA 44444 Pattmco Racogaritian lexaaaa
50. 4A4A AAAACAAA4464444444 0 Teand PEP EPETES ESTES TTT E 4AAAAAAACAAA4A4CAAAA CAAA4A AA4A44 4444444 Add Naana 4AAAAAAEAAAAEAAAARA 4A44A4A44A A4AA4 AAA4A4 A amp AA4 AA4A44 4444444 Causal Vaciaaima 4AAAAAAGCAAAAEAAAAA 4A4A44AAAEAAAALAAAACA AAA AAA 4EAA AAAA4 Peint Omatination 4AAKAAAAEAAAAEAAAAA 4AAAAAAACAAAACAAAAE AA AA AA AA AAAAE AA 4EAAAACAAAAEAAAAAA 4AAAAAAACAAAACAAAAGAAAAE AA AA AA AA AA ARE AAA AG AA A AG AA AAE AA AAE AR AA AA AA AA AA AA AAA A SBR BBE BBB BEB BB BBB AA AAA qM ARA AAAAERAAAAECAAAAEAAAAEAAAAAA e4AA44444 44446444464444 4444 Kuctosia o 9F ananaae ann aea nhe n44A4eanAAnA 4AAAAAAACAAAACAAAACAAAA AA 02333 any hwy M a AAAAACAAAACAAAACAAAACAA AAA 4AAAAAAACAAAACAAAAGAAAAGAAAAG AA AA E AA AA AA AA AA AA AA AAGCAAAAGAAAAG AA AA AA AA AA AA AA 4AAAAAAACAAAAGCAAAAGCAAAAG AA AA AA AA AA AA AA AA AA AA AA AAGCAAAAG AA AAKG AAA AE AA AA AA AAA 4 4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AR AA AR AA AA AACAAAAECAAA AGAR AA AA AA AA AA AA AAA A 4AAAAAAACAAAACAAAAGCAAAAG AA AA AA AA AA AA AA AA AA AA AA AAGCAAA AGAR AA AA AA AK AA AA AAA A 4AAAAAAACAAAAGCAAAAGAAAAGAAAAE AA AA AA AA AA AA AA AAGCAAAAKGCAAAAG AK AA AAA AG AK AA AA AA AA 4AAAAAAACAAAAGCAAAAGAAAAG AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA A AG AK ARE AA AA AA AAA 4 4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AA AA ARA AA AA AA AA AAGCAAAAG AA AA AA AA AA AA AA AA A A 4AAAAAAACAAAACAAAAGAAAAG AR AAG AA AA AA AA AA AA AA AA AA AAGCAA AA AAA AG AA
51. 5 12 5 13 TABLE OF CONTENTS con t Chapter6 PCA 6 1 Classical Principal Components Analysis 6 2 Display Matrices 6 3 Eigenvalues 6 4 View Components 6 5 Transform Data 7 1 General Description 7 2 Modify Graph Colors and Shapes 7 3 Command Summary for 2D and 3D Graphics 7 4 2 Dimensional Graphs 7 5 Zoom Feature 7 6 3 Dimensional Graphs 7 7 Moving 3D Graphs 7 8 Change Size of 3D Graphs 7 9 Search Observation Mode 7 10 Quick 2D Graphs 7 11 Response Surfaces Chapter 8 System information 8 1 User s Guide 8 2 Other options 8 3 Exiting Scout Chapter 9 Scout Basics Tutorial I 9 1 Nomenclature 9 2 Read Data Files 9 3 Examine and Save Statistics 9 4 Transformation of variables 9 5 Summary 10 1 Outlier Detection 10 2 Determining Causal Variables and Removing Flags 10 3 Summary 6 1 6 1 6 2 6 2 6 2 Fl 2l 12 7 3 13 7 4 7 5 7 5 7 6 7 6 7 6 9 1 9 2 9 3 9 4 9 5 10 1 10 2 10 3 TABLE OF CONTENTS con t Chapter 11 Robust Method Tutorial III 11 1 Q Q Plots 11 2 Q Q Plots of Principal Component Analysis 11 3 PCA Sactter Plots 11 4 Statistical Intervals 11 5 Index Plots 11 6 Generalized Distance 11 7 Kurtosis 11 8 Summary Chapter 12 Classical PCA Tutorial IV 12 1 Display Matrices 12 2 Eigenvalues 12 3 Transform Data 12 4 Summary Chapter 13 Graphics and System Tutorial V 13 1 Graphics 13 2 System 13 3 Summary Chapter 14 Statistical Pr
52. A AA AA AA AA AA AA AA AAA A T Canwvaciancm ake i eM 22 Imagtna 22 idth st Imagta gt widkn a E u a langta 0 124 077 o o0168 o o1 wick 0 099 TEE n uiz o 009 u 2 langta O o18 DI u us o o068 widti n i 007 o 006 o o1f ge 2t Pcmas CP to geint ac cESC to axit 4AAAAAAACAAAAGCAAAACAAAAC AAA AC AA AAG AA AACAAA AE AA A AK AR AR AK AA AA AA AA AA AA A AK amp K AA AA A Dicmckacy C SSCDUTASSUATA Filmaoanm IRIS OAT Figure 12 2 The covariance matrix for the four variables After the covariance matrix is calculated the matrix can be saved by using the lt P gt key and typing the path and the file name to save the matrix 12 2 Eigenvalues To calculate the Eigenvalues corresponding to various principal components move the cursor to highlight the Eigenvalue heading press lt ENTER gt select Covariance press lt ENTER gt again and you will generate the cumulative variance table for various principal components as shown in the Figure 12 3 Scout Tutorial 12 2 Chapter 12 Tutorial IV Film Data Claaza cal Hamthod Roesust Hmxthod PCA Gcagnica Sysatan 4AAAAAAACAAAAGCAAAAG AA AA AA AAE AA AA AA AA AA AA AA A A AA Lhe LL LRA 4AA4AAAAACAAAACAAKAAEAAAACEAAAACAAAACAAAACAAAACAAAA AA4A4A4 5mImck Vaciaalima Jecencee 4AA4AAAAACAAAACAAAAKEAAAAEAAAAEAAAACAAAACAAAACAAAACA4A4A44 Diagieay Matcicma eaasaaa 4AAAAAAACAAAACAAAAGAAAAC AA AA AA AA ARA AA AA AA AA AA AAA Vian
53. A AA AK amp AAA AAA SAAAAAAAGCAAAACAAAAK AA AAE AR AA AA AACAAAAGCAAAAG AR AA AK AA AK AA AA AA AA AA AA AA AA AAA 4AAAAAAAGAAAAGCAAAAC AA AAE AA AA AA AAECAAAAGCAAAAG AA AA AK AA AR AA AK AA AA AA AA AA AA AAA 4AAAAAAACAAAAGAAAAK AA AAE AA AAC AA AACAAAAGCAAAAG AAA AG AA AA AA ARE AK AA AA AA AA AA AA AAA A 4AAAAAAACAAAAGAAAAE AA AAE AA AA AA ARE AA AAGC AAA AG AA AA AA AA AK AA AK ARE AA AA AA AA AA AA AA Dicmctacy C SSCDUT945 SDATA Fi lanana IRIS OAT Figure 10 1 The level 2 Classical Method menu and the explanation window for Generalized Distance 10 1 Outlier Detection For outlier detection select the IRIS DAT data file First choose the Generalized Distance heading from the Classical Method menu set the to either 0 1 0 05 or 0 01 and use the lt ENTER gt key to generate list of outliers in the data set There are no outliers detected using this method for any of the three values Due to masking the classical Generalized Distance test could not identify any outliers Now use the Multivariate Kurtosis heading with the same three values and as shown in figure 10 2 with set to 0 1 one outlier is detected in the data set 1 The limitation of only three values for in the classical Generalized Distance test can be overcome using the Robust Method selecting Robust Analysis setting Display Graphs for to Q Q Plot Generalized Distance Scout Tutorial 10 1 Chapter 10 Tutorial Il Compute St
54. A RE or ch Had TO Mech at NG E Ken RS Classical The discriminant analysis method heading has two choices Linear and Quadratic which can be selected by using the ENTER key when the cursor is at Discriminant Method in the third window Statistics Options presents the same menu as described in Section 5 3 Use the down ARROW key to move the cursor to the last selection Generate Confusion Matrix With Current Options Use the ENTER key to generate the Confusion Matrix Use the ESCAPE key to return to the third window if the parameters need to be readjusted or other analyses performed 5 6 Pattern Recognition The pattern recognition heading performs principal component and discriminant analysis The data should be multivariate in nature with at least two variables The first column should be population ID numbers a number from 1 to 20 When Pattern Recognition is selected the explanation window will display the message Pattern recognition using discriminant scores and principal components analyses Pressing the ENTER key displays the third window revealing various headings The available headings and example choices for Pattern Recognition are as follows Headings Example Choices Statistics ODBOIS x Suse ve quo sed d RR Ra WE eee weed Classical Nuinberulg ss cies hierne bee neh oe Ue kw eS bee we bee E cR Observations Contour Ellipse ex 26 fo ee an Ce Bed ERE CAR ERK BE EAR eS Indiv amp Simul Scout User s Guide
55. AA AA AA AA AA AAGCAAA AG AA AA AA AAE AA AA AAA AAA Dicmckacy C SSCOUTISSOATA Fi lanana IRIS OAT Figure 12 1 The PCA menu with Select Variables chosen Count is the only variable not selected checked 12 1 Display Matrices After the variables are selected press ENTER returning you to the PCA menu and move the cursor to highlight the Display Matrices heading There are two choices for this heading 1 Covariance and 2 Correlation Choose Covariance Use the ENTER key to produce the covariance matrix as shown in Figure 12 2 The diagonal elements are the variances and the off diagonal elements are the covariances Scout Tutorial 12 1 Chapter 12 Tutorial IV Film Data Classical Hmthad Robust Hmsthod PCA Gcagnica Syatan 4AAAAAAACAAAAGCAAAAGCAAAACAAAAEAAKAAGE AA AA AA A AG AA AA A A A nh LLL KK 4A4AAAAAEAAAACEAAA4ACAAA4EAA ARA CAA AAEAA4kA6 A44 46A 4A4464444 Smimct Vaciaaima leaaacaa 4AA444AAAEAAAACAAAAEAA AACAA AAA AAA AAA46AA amp 46444 4 4444 OQragtiay Matcicma leananaaa 4A4AA4AACAAAACAAA4ACAAA 46A AA CAA AA EAA 4 ACA A4 A46 A A444 64444 Eidgmowalumsa 4AAAAAAR 4ARARARAEAAAAEAARACAAA ACAAAACAA AA CARA AUA AR A6 A AAA CA A4 4 V i 4 Congonaots PPETTTTAM 4A4AA4AAEAAAAEAAAACAAA AGAR ARACAA AAEAA 4 A6 A A4 A46 A 4k 446 4444 Tcanatacmn Osta leaaanaa 4AAAAAAAEAAAREAAAACAAAAEAA ARECAA AA EAR AAEAA A46 AA 44644 4 A M la e 4 4 4 4 4AAAAAAACAAAAGCAAAACAAAACAAAAE AA AAGAAAAGCAAAAC AA AA AR AA AA A
56. AA AA AAAACAAAA 4AAAAAAACAAAACAAAAGAAAAG AA AA AA AAE AAAAEAAAACAA4A4ACAA4A44 V nn Congonanta lean Eigmaowalumsa leceeeee 4AAAAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAAEAAAA Tcanatucm Data saaaaaa 4AAAAAAACAAARCAAARAEAAAACAAAACARARCAKAAEAAAACAA A464 4 4 4 e h a ah 4AAAAAAACAAAACAAAAGAAAAG AA AA AA AA AA AA AA AA ARA AA AA AAC AA AAECAAA AC AA AA AAA AG AA AA AA 4AAAAAAACAAAACAAAAGAAAAG AA AAE AA AA AA AA AA AA AA AA AA AAC AA AACAAAAGC AAA AG AA A A amp AA AAA 4AAAAAAACAAAACAAAAGCAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AK AA AA AAA A 4AAAAAAACAAAACAAAAGAAAAG AK AA AR AA AA AA AA AA AR AA AA AA AA AA AA AAKGAAAA amp AA A AE AA AAA A 4AAAAAAACAAAACAAAAGAAAAG AA AAK AA AA AA AA AR AA AA AA AA AA AA AAG AA AA AAA AG AK AAE AA AAA A RBBB BBB BBB BBE A pm M3MMAAAAREAAAAEAAAAEAAAAEAAAAAA AAA 444A A444 A4444 A4444 A444 Vaciaaims Tcanatncnmd ananaacananaeaanneanaannaAnaa 4AAAAAAACAA ARCAA AA CARA A464 A a Dm 0333 any hay M a AAARACAAAACAAAAEAAAAAAAA 4AAAAAAACAAAACAAAAGAAAAEAAKAAE AA AA AA AA AA AA AA AA CARA AAC AA AA AA AAG AA AAKG AAA AE AA AAA 4AAAAAAACAAAACAAAAGAAAAGAAAAK AK AA AA AA AA AA AA AA AA AACAA AA AA AA AA AA AK AA AA AAA 4AAAAAAACAAAACAAAAGAAAAG AA AAE AA AA AA AA AA AA AA AA AA AA AA AAGAAAAGAAAAG AA AA AA AAA A 4AAAAAAACAAAACAAAAGAAAAG AA AA AR AAE AA AA AA AA AR AA AA AACAA AA AA AA AA AAG AAA AG AA AA AA 4AAAAAAACAAAACAAAAGCAAAAGAAAAG AA AAK AK AA AK AA AK AR AA
57. AAA AE AAAAEAAAAEAAAAE AAA AE AAA AE AAA AEAAAAEAAAAAAK SRR A quam Fa ay Mabcicma eM Mesa AA aaaaan d Congutas and diagiays mibomc bha covaciancm nabcix oc bm 1enaannas eanasaaxn cocecmlationmateix Ift any outlimes acm gemaant thw usar fecccceene PPPE dacidas hathar oc nok bhay acm to bx uamd Nececeeenae thAAAAAK 4AE AAAAAA SAARAAAAAEAAAAEAAAAEARARAAEAAARAEAKAAAERAAAAEAAAAEAAAAEARAARAEARAAAEAAAAEAAAAEAAAAEAAAARAR 4AAAAAAAEAAAACAAAACAAAAGAAAAGCAAKAAG AA AAGAAAAGAAAAGCAAAACAAAAGCAAAAC AA AA AA AA AAAA AA 4AAAAAAACAAAAEAAAACAAAAG AA AAG AA AA AA AAC AA AA AA AACAAAAGCAAAACAAA AC AA AA AA AA AAAAAA AAAAARAAEAKAAAERAAAAEAAAAEAARAAEAAARAERAARAAEAAA AE AAA AE AAA AEARAA AE AAA AE ARA AE AAA AE AA AAAA SAAAARAAREAAAARERAAAERAARAEAAAAEAARAARERAAAEARARAEAAAAEAAAARERAAAEAAARAEAARAAEAARAAREAAAAAR Diecmctacy C SSCDUT995SSDATA Fi lanana IRIS DAT Figure 6 1 The PCA menu and a description of Display Matrices The Select Variables heading has been discussed in earlier chapters so we omit its description here 6 2 Display Matrices The user may choose to display the covariance and or correlation matrices To do this select Display Matrices from the PCA menu Within this heading users can remove outliers found by the Classical Method manually If any outliers have been identified Scout will ask the user if outliers are to be used or ignored Then Scout will ask the user which matrix he is interested in covariance or correl
58. AAAAAEAAAAEAAAAEAAAAEK AAA AEAAAAEA A A MSEARAAAREAAAAEAAAAAR Cantuagona Haba 1 Data Sunday May Id 1795 l Film FULLIRIS OAT Trtim letra data in Full Peadictad Actual Pogi Pose Poss Pogi so n o Pop n 46 z Poss o Doamewation Classification Oittacancas Aun Nana Actual Pemdict ri z 5 Ei 2 5 154 5 z Pcmas P gt bn gciat nc CESCD to maxik mmm Oicactocy C SSCDUTSDATA Fi lanana FULLIRIS OAT Figure 11 17 The confusion matrix associated with Fishers discriminant analysis of FULLIRIS DAT Scout Toturial 11 16 Chapter 11 Tutorial Ill Press lt ESC gt once more and the scatter plot of the first two discriminant scores is displayed Pressing lt E gt will once again draw ellipses around the populations as shown in Figure 11 18 Pressing lt Page Down gt three times will produce Figure 11 19 Discriminant Score 1 vs pt length isher s Classical Discriminant Analusis 31l 3 Species 13 12 v D u a c m c E E Ez Pu u MW P a 4 t 5 05 6 68 Discriminant Score 2 Figure 11 18 Plot of Discriminant Scores with superimposed ellipses Scout Toturial 11 17 Chapter 11 Tutorial Ill Scatter Plot of First Dizc Score Us pt length 13 12 v D u a c m c E A Ez x u Wn P a t 3 92 mBt lenath Figure 11 19 Discriminant Score 1 vs pt length Scout Toturial 11 18 Chapter 11 Tutor
59. AAAAKAAAEARAAAEARAAEAAAAEAAAAEAAAAEAARAAE AKA AE AA AAEAAAAEAAAAEAA A AE AAA AE AAA AE AAA AA 4AAAAAAACAAAACAAAACAAAAK AA ARAS AR AA AA AAGCAAAAC AAA AE AA AA AA AA AK AA AA AA AA AA AA AAA A SAAAAAAAEAAAAGAAAAGCAAAACAAAACAAAAG AA AA AA AA AK AA AA AA AA AA AA AA AK AA AA AA AA AAA A SAAAAAAAEAAAAGAAAAGCAAAACAAAAC AAA AE AA AAK amp AR AA AA AA AA AA AA AA AA AA AK AA AA AA AA AAA SAAAAAAAEAAAAEAAAACAAAACAAAAEAAAAE AA ARE AA AAE AK AAECAAAAE AA A AK amp AA AAK amp E AA ARE AA AA AA AA AA SAAAAAAAEAAAAGCAAAAGCAAAACAAAAC AA AAE AA AK ARA AA AA AA AA AAE AA AA AA ARE AR AA AK AA AA AA AA SAAAAAAAEAAAAEAAAACAAAACAAAAEAAAAE AA AAE AA AAGC ARA AACAA AA AA AA AA AA ARA AAG AA AA AA AA AA dAAAAA AC qu Eni b Daba M EEEE e aanaaan d Edit oc wian tha daba i n nanocy Editing bha ssamewations eaanasns cartita allows tha uamc to modify tha labala and tne data valuxma feccccaaae PEPPE Tha titis and waciaa2im intocnation may alzo 3m mdibmad e cenananns SARRAR 4A AAAAAAR SAAAAAAACAAAAEAAAACAAAAEAAAAK AK AAGAA AA AAA AE AR AAE AA AAKGAA AA AA A AG AK AA AA AA AA AA AA 4AAAAAAAEAAAAGCAAAACAAAAE AAAAG AA AAGCAAAAE AA AA AA AA ARA AACAA AA AR AR AK AA AA AA AA AAA SAAAAAAACAAAACAAAAGCAAAACAAAACAAAAGAAAAE AA AA AA AA AA AAKGCAA AA AA AA AK AA AA AA AA AA AA 4AAAAAAAEAAAAGAAAACAAAACAAAAC AR AAG AA AA AA AA AA AA AA AACAA AA AA AA AK AR AA ARE AA AA AA SAAAAAAACAAAAGCAAAAGCAAAACAAAACAAAAG AA AA AA AA AA AAKG AA AA AA AA AA AA AK AA AA AAG AA AA AA Ore
60. AAEAA AAEAAAAA4A SAAAAAAAEAAAAEARR SAE AAAAEAAAAEARAAAEARAAAEAAAAEAAAAEAAAAAA SAAAAAKAAEAAAAEAAAAEAAAAEAAAAEAARAAEAAARAEAAAAEAARAAEAAAAEAAAAEAAAAEAAA AE AAA AEAAAAAA SAAAARAAAERAARAEAAAAEAAARAEAARAAEARAAEARAAAEAAAAEARAAAEAKAARAEAAAAEARAAAEAAAAEAAAAEAAAAAAK AAAAAAAEAAAAEAAAAEAAAAEAARAAEAAAAEAAAAECEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAKAAAR SAAAAAAARERAAAEAAAAEAAAAEARAAAERAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AARAAEAKAAAEAAAAARK SAAAAARAARERAARAEAAAAEAAAAEAAAAERAAAEAAA AE AA AAREAAAAEARAAAEAAAAE AAA AE AAAAEAAAAEAAAAAK PE quue Sm Imcb Vacianima ETT ETT anaanaand Allows thw uzmc Lo axmimct a au33z L of thew vaciaa2imas to oe PP3YYPPTPT eaaaanaad usad in bha nubi mc twats and causxm3 coutionsa Tha dat uit e eenaanaaa anaanaad to uam all ot tha vaciaaima 4AAAAAAAA CO ee ci AAKAARAAKAEAAAAEAAAAEAAAAEAARARACEARARAAEARAAAEAAAAEAAAAEAAAAEAAAAEAARAACEARAAEAAAAEAAAAAAR AAKAAAAAERARAAAEAAAAEAARAAEAARAAEAAAAEARAAAEAAAAEAAAAEAAAAEAAAAEARARAAEAARAAEAAAAEAAAAAAR AAKAAAAKAEAAAAEAAAAEAKAAAEAARAAEAAAAEARAAAEARAAAERAAAAEAAAAEAAAAEARARAAEAARAAEAAAAEAAAAARK SAAKAAAAKAEAAAAEAAAAEAAAAEAARAAEARAAAEARAAAEAAAAEAAAAERAAAAEAAAAEAAAAEARAAAE AAA AEAAAAAAR AAKAAAAKAEAAAAEAAAAEAAAAEAARAAEARARAAEARAAAERARAAAERAAAAEAAAAEAAAAEAAAAEARAAEAAAAEAAAAARK Dicmckacy C fOS6 Fi lanana TEST OAT Figure 41 The six options ofthe Classical Method menu and the explanation window for Select Variables CAUTION The removal of data values should not
61. AEAAAAEAAAAEAAARAEAAAAEAAAAEAAAAEAAAAAR AAEARAAAERAAAAEAAAAE AAA AE AAA AE AAA AE AAA AE AAA AE AAAAEAAAAEAAAAAKR SARAAKRAAAEARAAAEARARAAEAARAAEAARAAEAAAAEAARAAEAARAAEARAAE AAA AE AAA AE AAA AE AAA AE AAA AE AAA AAA 4AAAAAAAEAAAACAAAACAAAACAAAAE AA AAG AA AAGC AA AA AA AA AK AA AA AA AA AA AK AA AA AA AA AAA A SAAAAAAAEAAAAEAAAAGCAAAACAAAAG AA AA AA AA AA AA AA AAG AAA AG AA A AC AA AA AK AA AA AA AA AAA A 4AAAAAAACAAAACAAAACAAAACAAAAEAAAAGCAAAAGCAAAAE AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA SRE AG quM pxad AZCOO Fa eM Mea AA AAA 4444444 Raada a data amt Ecom an ASTI Film on any disk Tha tila feccccccea PPP tocnmat dat inad io tha Usac a Guidm 3 GED EAS congpati lxs beaaaaasans PEPEPEPE CAUTION Oata i manocy wil om loast 4AAAAAAAAR F LL SSS S M AAAAAAA 4A AAAAAAR 4AAAAAAAEAAAAGCAAAACAAAACAAAAC AA AAG AA AA AA AA AA AA AK AA AA AA AA AA AK AA AK AA AAA AAA SAAAAAAACAAAAEAAAAGCAAAACAAAAG AA AAG AA AAGCAA AA AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA 4AAAAAAAEAAAACAAAACAAAACAAAAG AA AAG AA AAC AA AA AA AA AR AA AA AA AA AA AK AK AA AA AA AA AA qARKAAKAAAEARAAEARAAAEARAAAEARAAEAARAAEAARAAEAAAAEAARAAEAAAAE AAA AE AAA AE AAA AE AAA AE AA AAA 4AAAAAAAEAAAAGCAAAACAAAACAAAAGAAAAGCAAAACAA AA AA AA AK AACAAA AC AA AA AA A AC AA AA AAA AAA icmckacy C SSCDUTSDATA Filmhanm IRIS OAT Figure 9 1 The first window in Scout showing the level 1 m
62. Congponsnta leanananaa d Eigmaowalumsa 4A4A4AAAA 4AAAAAAACAAARCEAAAAEAAAACAA AREAR AR CARA AACAAA amp 6 A4 444 A4444 4AAAAAAAEAAAAEKAAAAEK AAA AE AAA AE AA AA AK AA AA AA AA AA AAA Teanatocnm Data leaaaaaa 4AAAAAAACARARCAAAACAAA AGAR AREAAARKEAAAACAA A ACA A A 4 4 4 4 R MM a t h ttt 4AAAAAAAGCAAAAGCAAAAG AK AAEAAAAE AA AA AA AA AA AACAAAAG AA AA AAA AK AK AK AA AA AA A AE AA AAA p Cunmulativm Vaciancm Taala iCovaciancm AI Conaonmat Ei gmowalum Dit tmcmacm Peagoctian Cumulatiwm EEE m n m LEO A Al 0 2565 0 1773 re 4fa re 4fa 0 05697 n ainti 11 942 EE 412 0 0266 oo org G amp tfa 9 062 n gu o o 2 922 100 02 239cmas CP to geint oc ESOS to weit 4AAAAAAAGCAAAAGAAAAC AA AA AA AA AA AA AA AA AA AAGAAAAG AA AA AA AA AK AA AA AA AA AA AA AAA Dicmctacy C SCOUTISSOATA Filmahanm IRIS OAT Figure 12 3 The cumulative variance table for the four principal components To view the Eigenvalues press lt ESC gt to return to the PCA menu move the cursor to highlight View Components select Covariance and press lt ENTER gt to generate the table for component loadings as shown in the table 12 4 Scout Tutorial 12 3 Chapter 12 Tutorial pr Congonant Luadingas iICovaciancm Habcix n vVaciaaim ril ag Imagta ft widkn waciaaim 22 Imagta gt idth waciaaim Eigmowalum Eigmowalum Eigmaowalum I 0 2565 vVaciaaim a9 width 2 0 05697 waciaaim 2
63. Covariance Weights Beta Chi Squared X Y Coordinate Scale Factor 96 An integer betweeon 100 and 100 Scout User s Guide 5 7 Chapter 5 Right Tail Cutoff Tuning Constant Control Chart Limit Trimming percent Ignore Population Plot Ignored Population Robust Statistical Methods A number between 0 01 and 0 8 to be used with Huber or PROP A number between 0 1 and 5 0 A number between 0 01 and 0 5 An integer between 0 and 100 to be used with Multivariate Trimming A non negative integer to represent the population not to be considered in the analysis Yes No The last two headings assume that the data set has the population ID in the first column NOTE This Statistics Options menu is also shared by the three other procedures in the Robust Analysis main menu Confusion Matrix Pattern Recognition and Causal Variables The explanations of these headongs will refer back to this description For the last four headings in the fourth window Statistics Options given above the numbers for choices can be typed to the screen after using the ENTER key when the cursor is on the corresponding statement The other choices can be selected by using the ENTER key repeatedly After all selections are made move the cursor to the bottom of the fourth window to the Accept New Settings Use the ENTER key to accept the selected choices for the Statistics Options and return to the third window The remaining hea
64. EAARAAEAAAAERAARAAEAAA AE ARA AE AAA AE AAA AE AAA AE AAAAAR SAAKAAAARERKAAAEAAAAEARAAEAKAAAEAARAAEAAAAEAAAAERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAAR SAARAAARAARERAAAEAAARAEARARAAEAKAAARERARAAEAAAAEAAARERAARAAEAAARAEAARAAREAAAAEARAAAE AAA AEAAAAAKR AAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AE AKA AE AAAAEAAAAAA AAAAKAARERAARAAEAARAAEARAAE AAA AEAARAAEAARAAE AAA AEARAAAEAAAAE ARA AEAAAAEAAA AE AAA AE AAAAAAK SAAAAKAAARERARKAEAAAAEAARAAEAAAAEAARAAEAAAAEAAAARERARAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAARK SAAAAARARERAAAERAAARAEAARAAREAAAARERARAEAAAAEAAAARERAAAEAAAAEAARAAERAAAAEARARAAEAAAAEAAAAARK SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AK AAA AE AAA AE AAA AA amp AAKAAAAARERAAAAEAAAAEARAAAEAAAAEAAARAEAAAAEAARAARERAAAAEAAAAEARAAEAAAAEAAARAE AAA AE AAAAAARK Diecmcktacy C SCDUT9SSDATA Fi lanana IRIS OAT Figure 9 3 The level 2 Data menu with the Transform heading selected and the level3 Transform menu showing headings for the two different normality tests Scout Tutorial 9 5 Chapter 9 Tutorial including Z standardization Logarithmic Box Cox type Johnson and Wichern 1988 Power square root and more are available in Scout Film Data Classical Nataod Rosust Nataod PCA Geagnics Sysatan HE a ng ARK R AACA ARKO Tecanat ncmat ina Wanu Naan 7 so 4 ta lt a ta aa 4 4 4 4 4 4 A44 4 4 AA C
65. Fi lanana FULLIRIS U0AT Figure 11 11 The Pattem Recognition menu with numbering set to populations Select Begin Computations with Current Options press lt ENTER gt and the scatter plot for the principal component scores will be drawn Press lt E gt to draw the ellipses around the three populations and the scatter plot should match that shown in Figure 11 12 Scout Toturial 11 11 Chapter 11 Tutorial Ill Scatter Plot of POs call 3 Species Pw o u I 2 a n nt PO Score 2 Figure 11 12 Scatter plot for the principal components of all three species The populations are identified by number and defined by ellipses Next use lt Page Down gt once to view PC Score 1 vs PC Score 3 You will notice the largest ellipse extends past the Y axis as shown in Figure 11 13 Scout Toturial 11 12 Chapter 11 Tutorial Ill Pattern Recognition Pw o u I 2 a A t n amp 4 n g PO Score 3 Figure 11 13 Three populations with one ellipse extending beyond the boundaries of the graph Scout possesses the capability to scale this scatter plot so that the entire ellipse can be seen Press lt ESC3 gt select Statistics Options press ENTER select X Y Coordinates Scale Factor press ENTER type in 20 press ENTER again and regenerate the scatter plot Figure 11 14 shows the result all three ellipses are now entirely on the screen The
66. HA tA o CAAA AA CHK CHR 4 ta NaxeS 6 Kn Innoc awe Teana vVaciaa im count 2 langtn 32 idth gt Imagta gt widkn r l l l l l l l l l l l I Nined s r l l l l l l l l 3 CENTERS Teansatocn CH Figure 9 4 a so so so so so EI 5 l o gt SAAAARAAREARAAAEAAAAEARAAR aaanaaeannaeaaAnAneaa e l Linmac x I al aaaaaaeannnenannncaaaa l Logecitan AARAARAAEAAAAEAAAAEAAAA Ponar Jeccccaceccccecannenanal Box Cox lx 4AAAAAACAAAAEAAAACAAAA A4A4AA 4444 A444 4444 Unde Laat Teanatocn eaaanaacaaane an A4saaaua EESC whan Piaranad Accaiom a accainn LX SAARARAARERAARAEAARAAEAAAAEAAAAERAAAEAAAAEARAAAEAAAA SAARAARAARERAAAERKARAAEAARAAEAAAARERARAEAAAAEARAAREAAAA anaanaennaaneannAekenAn4eAnnAACAAA4 EAA444 4AA46AAAA hertrtttetttRERRRRERRRRERRRRERRARERRRRERRRRERRRR StcdOaw I o o 558 ST9 ita 105 Zhs32 7 323 IE n 1b n u 4 o 105 1 216 Twat Stat IESI 0 11s o 105 O 155 n 549 4AAAAE ARKAAEAKAAAEAAAAEAAAAEAAAAEAAAKAEAAKAAEAAAAAA Zmicanw Nacmaliby Twat lalgna D U0S m Cert Wal a 12s 12s 12s 12s 12s tHe at ogean CPS Pork CE SCS Exit Selection of the Transform heading followed by selection ofthe Kolmogorov Smimov normality test heading and one additional pressing of the ENTER key produces the list of variables and
67. Haximum USL a5 Warning USL 4 c v c D a 9 a m a E u c E b a 3557 Warning LSL z Sax Maximum LEL 23 0 t t t 1 85 n 3 28 Theoretical Quantiles Normal LDiztributiaoan Figure 11 8 Q Q plot of principal component 1 Next we use the data file containg all three species of iris FULLIRIS DAT go to File select Read ASCII File select FULLIRIS DAT and press lt ENTER gt Return to Robust Method select Robust Analysis and change Numbering from Observations to Populations using the lt ENTER gt key Next move to Generate Graph With Current Options press ENTER and the three different species of iris should be distinguished on the garph as three different sets of numbers as shown in Figure 11 9 This figure immediately suggests that there is more than one population It is remarkable to see how the observations from the three populations are grouped togeher on this graph Scout Toturial 11 8 Chapter 11 Tutorial Ill Robust Analusis 4 c v c D a o a m a E u c E b a nso 111i aat a1 1 t t 1 8 amp 5 n 3 Theoretical Quantiles Normal LDiztributiaoan Figure 11 9 Q Q plot of the first principal component three populations species present 11 3 PCA Sactter Plots PCA Scatter Plots can be produced by selecting Scatter Plot PCA from the Select Graph Type menu found under Display Gra
68. Plot PCA Q Q Plot Generalized Dist Scout User s Guide 5 6 Chapter 5 Robust Statistical Methods Control Charts Indiv Xi Control Charts Simult Xi Control Charts Defects CI Limits Population Mean Prediction Intervals Index Plots Multivariate Kurtosis Use arrow keys to reach the desired procedure and then press the lt ENTER gt key to make a selection from this list The fourth window will disappear and the third window will reappear with the selected choice listed after Display Graph For CI Limits Population Mean This choice outputs the relevant statistics and the limits for confidence interval for mean on the screen These limits can be graphed by pressing the letter Q or q The Prediction Intervals can be graphed similarly The Control Charts Simult Xi choice produces the graph for simultaneous confidence interval for selected settings as described in Singh and Nocerino 1995 Multivariate Kurtosis simply computes the multivariate kurtosis for the selected options No graph is generated for this procedure Some of these options are discussed in the tutorial section Move the cursor to the Statistics Options heading Use the ENTER key to display the menu The various choices for the Statistical Options headings are listed as follows Heading Choices Compute Statistics Using Classical Huber Influence Proposed Influence Multivariate Trimming Initial Estimate Classical Robust Matrix Correlation
69. REARAACAAAACAAAACAAAAC AAA AK AA A A amp AA AAA aaaaaaa Peint Data ennna AAA AAAEAAAAREAAAACAA ARCAA ARCAR ARCAR AREAR AKEAR AKEAA AA 4 4 4 4 4 4 4 D ce A44 4644446 A amp AAA6ARAA6AAAAEAAAAEAAAAEAAAAEAAAAECAAAACAAAAAA 4AAAAAAAEAAAAGAAAAGCAAAAGCAAAACAAAACAAAAC AA AA AK AAE AK AA AA AAG AA AA AA AA AA AA AA AA AA vVaciaaim a Nia Nia Max Naan StdOaw acy Wactanca count so 1 i 1 0 o o o o o o ap langth so i Si 006 5 52 LONI o ize a9 Width so 2 i 426 579 11 036 0 144 gt Imaqkh so 1 I 4B2 SETA tl ere o 05 n n ft idth so RAIL 105 42 64 o o1t Qemasa CP to geink nc cCESC to mxik 4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAAC AAA AC AA AA AK AA AA AA AK AA AA AA AA AAGCAA AA AA AAA A Dicmckacy C SSCDUT9SSOATA Fi lanana IRIS OAT Figure 9 2 The level 2 menu for the Data heading with the summary statistics for IRIS DAT displayed We are skipping Edit Data this is a potent choice with the potential to drastically change the output we are trying to lead you through while learning Scout we really have no need to edit data Keep in mind that this choice is available and will allow you to alter the input data file including the deletion or insertion of columns variables or rows observations The summary statistics describe IRIS DAT or whatever data file you used in terms of Scout Tutorial 9 3 Chapter 9 Tutorial 1 the number of data points in the file 2 the number and identities
70. Robust Method heading of the Scout software package The successful identification of anomalous observations depends on the statistical procedures employed The classical Mahalanobis distance MD and its variants e g multivariate kurtosis are routinely used to identify these anomalies These test statistics depend upon the estimates of population location and scale The presence of anomalous observations usually results in distorted and unreliable maximum likelihood estimates MLEs and ordinary least squares OLS estimates of the population parameters These in turn result in deflated and distorted classical MDs and lead to masking effects This means that the results from statistical tests and inference based upon these classical estimates may be misleading For example in an environmental monitoring application it is possible that the classification procedure based upon the distorted estimates may classify a contaminated sample as coming from the clean population and a clean sample as coming from the contaminated part of the site This in turn can lead to incorrect remediation decisions It is well established among practioners that for the identification of multiple outliers one should use robust procedures with a high breakdown point The estimates obtained using the robust procedures should be in close agreement with the corresponding MLEs when no discordant observations from different population s are present Robust procedures for the
71. SCOUT USER S GUIDE NOTICE Although the production of this report was funded wholly by the United States Environmental Protection Agency through contract 68 CO 0049 to Lockheed Environmental Systems amp Technologies Company it has not been subjected to Agency policy review and no official endorsement should be inferred TABLE OF CONTENTS Chapter 1 Preliminaries 1 1 Introduction 1 2 Manual Organization 1 3 Installing Scout 1 4 A Viewing the User s Guide Chapter2 Scout File Format 2 1 File Management 2 2 Reading Spreadsheet Files 2 3 Load Scout File 2 4 Save Scout File 2 5 Merge Two Files 2 6 Append Two Files Chapter 3 Managing Data in Scout 3 1 Data Management 3 2 Scout functions and operations 3 3 Summary Statistics 3 4 Data Transformation 3 5 Print Data Chapter 4 Classical Methods for Outlier Identification 4 1 Introduction to the Classical Methods for Outlier Identification 4 2 Select Variables 4 3 The Classical Outlier Tests 4 4 Causal Variables 4 5 Associated Causes 4 6 Remove Outlier Flags Chapter 5 Robust Statistical Methods 5 1 Introduction to Robust Statistical Methods 5 2 Choices of robust analyses 5 3 Univariate Statistics 5 4 Robust Analysis 5 5 Confusion Matrix 5 6 Pattern Recognition 5 7 D Trend 5 8 Add Means 5 9 Causal Variables 5 10 Print Destination iii 3 1 3 4 3 5 3 5 3 8 4 1 4 2 43 4 3 4 4 4 4 5 1 5 2 5 3 5 5 5 9 5 9 5 11 5 11
72. Simultaneous Individual Zero Lower Limit No Yes When satisfied with all heading choices use the down ARROW key to move the cursor to the final selection Begin search for causal variables Use the ENTER key to generate the table for Robust Causal Variables 5 10 Print Destination This heading will create graphics files with an eps extension The HP LaserJet III choice will print the screen graph to a LaserJet III printer Typing F will write the graphics screen to a pcx file When Print Destination is selected the second window will display the message Choose print destination for graphs When the ENTER key is pressed three choices are displayed in the third window as follows HP LaserJet III QMS ColorScript 100 Encapsulated Post Script Scout User s Guide 5 15 Chapter 5 Robust Statistical Methods Use Encapsulated Post Script to save the graph and data output files in a format that can be imported to a word processing software such as Word Perfect This option will create a graphics file with the extension EPS The HP Laserjet III choice will print to the screen or to a Laserjet III printer Pressing lt F gt can be used to write the graphics to a PCX file Scout User s Guide 5 16 Chapter 6 PCA 6 1 Classical Principal Components Analysis For simplicity and convenience a separate principal component analysis PCA menu has been included in Scout to perform the classical PCA The Q Q plots sca
73. Taco Lowmc Limit leceeeecee PPP Limit Styla 4AAAAAAAA eaaaaa 2n X Axis Vaciaanim leaanaannaa aaanaaa l Y Axia Vaciaanim leceeencec saaaaaan l Titis Rosuat Analysis leaaaaanas eaaaaa 2A X Axis Titis 444444444 PEPEE Hunamciag Daamcwabina as lececeaeads eeneeane Contouc Elligam Indiv Simul leaasaanas aaaaaaa enaaaaaana aaaaaa2A Ecasm Outguet Film 444444444 eaaanaaa l Wcome Saignts Gaowcealizad O 1atencas IRIS MTSZ leceeencee eatecaee Gmaomcabm Geagn With Cuccmat Dpt iona Jecenccccc 4AAAAAAA 4ACAAAAAA OQiemetocy C SSCDUTSUDATA Filmahanm IRIS DAT Figure 11 7 The Robust Analysis menu prior to the generation of Q Q plots of PCA The principal component Q Q plot should be similar to Figure 11 8 with the possible exception of the eight labeled data points which could be present by using the lt SHIFT gt technique on the highlighted points as described earlier From this graph it is clear that the observations come from a single population Setosa The Q Q plots for the other three principal components can be obtained by using the lt PAGE UP gt or lt PAGE DOWN gt keys Users can press the lt N gt or lt n gt key which will number all of the points on the graph Pressing the lt N gt key again will cause all numbers to disappear Note All keys used in generating graphics work similarly toggling on and off with repeated use Scout Toturial 11 7 Chapter 11 Tutorial Ill 42 2
74. Tutorial 13 2 Chapter 13 Tutorial V For a 3 dimensional scatter plot highlight 3 Dimensional from the Graphics menu and press ENTER to display the three dimensional scatter plot At this point the variables included in the data set are listed in the upper left corner of the display One of these variables will be highlighted use the UP or DOWN arrow keys to highlight any variable to be considered in the scatter plot After the variable is highlighted use the key pad to designate that variable by pressing X Y or Z and use the ENTER key to generate the three dimensional graph Press ENTER one more time to position the graph in the center of the screen as shown in Figure 13 4 Iris data in set Variables 5p length sp width pt length Help Exit Variables Search Mode Figure 13 4 The three dimensional graph of sp length x axis vs sp width y axis vs pt length z axis To view the data from different perspectives the 3 dimensional scatter plots can be rotated by using the RIGHT LEFT UP or DOWN arrow keys By increasing the number of strokes the speed of the rotation can be increased To reduce the speed use the opposite arrow key The rotation can be stopped at any position see Figure 13 5 through neutralizing the rotation effect by using the equal numbers of strokes using the opposite arrow keys or by pressing the lt SPACE BAR gt Several other features are ass
75. a taco Lowmc Limit leceentere eatcnnen Limit Styia 4AAAAAAAR PPPN X Axis Vacianla hesttasiass 44a444 l Y Axia Vaciaanim 4AAAAAKAAAK sa 4aa44 Titis Jecercccac anaaaana l X Axis Tibia 4AA4A4AAAAA PEPEPEPE Hunamciag Dasamcwab inazs lenanaanan a anaaa l Ceontouc Elligam ladiwidual leceecceac 4AAAAARAAR SARRAR anaaaaa l Ecasm Dutquet Film 44A4A4AAAAAK anaaaaa l Wgme WSaigqnts amp Ganwcalizad O atencas 4 METHYL TS lecenccece a444444 Gmamcabm Geagn With Cuccmat Dot iona Jeceecccae td 4AA4AAAAA 4A AAAAAA Dicmctacy C SSCDUTSUATA Filmahanm 4 NETHYL OAT Figure 11 23 The Robust Analysis menu for Prediction Intervals Scout Toturial 11 21 Chapter 11 Tutorial Ill Film Data Clazazaical Hamthod Robust Nathnod PCA Gcagnica Syatan 4AAAAAAACAAAACAAAAEAAAAE AA A A AA A A A R MA M RARR RRRA ERRAR R 4A4A44AAAALAAAACAAAACAAAAEA 44464444444 5mimct Vaciaaima leccceeceeeeaeecene dRA amp ARAREAAAALAAAA AdARCAAAQUA amp AA42 Uniwaciabm Statistica PP PEPPER 4A4AAAAACAAAAEAA ARE AAA AE A4 446 AAA 4444 Rouat Analysis leccetecececteacced 44A44AAACAAAACAAAA AA4A46A4 446 4444444 Contusion Mabcix PPTYYVPTETPP TTA 4A44AAAACAAAA AA gp Pomdacbion lnbmcvala k LH CAL AA AE AKA KAA dadaasaducaadAza d e caaanad nAxacu Aaa sa4anauaasaaaaaan A Natnod Peon Int lumacm Jecccccccectceceee 4AAAAAAAEAAAAAAA Naan 26 5512 feccceeceeerncerne 4A4AAAAAAECAAAAAAA Standacd mviabina 5 r599 eanaaaacaaaaenaaa
76. ad O iatencas IRIS WTS leaaaaanaaa cenecace l Gmamcabm Gcagn With Cuccmat Dot iona leceeecee 4AA4AAAAA 4A AAAARAAR Diecmctacy C SSCDUT 9SS DATA Filmoana IRIS DAT Figure 11 1 The menu for Robust Method with Robust Analysis selected and the menu for Robust Analysis displayed Select the first heading in the Robust Analysis menu Display Garphs For and press lt ENTER gt A menu entitled Select Graph Type will appear as in Figure 11 2 Select Scout Toturial 11 1 Chapter 11 Tutorial Ill Q Q Plot simul raw data and press ENTER The menu will disappear and the previous window will now indicate your graph choice opposite Display Garphs For Film Data Claza cal Nathod Robust Hmsthaod PCA Gcagnica Sysatan AAAARAAAEARAAAERARAAAEAAA AE AAA AE AAA AAA 6AAAAEAAAAEAAKAAAR RA AAWEXSAAEAARXGRKWRERARAEAAXAX 4 Zm mck Vaciasima eaaaanasa4A4e44444 4AAAAAAACAAAACAAAACAAAACAAAAEAAAAAA44 M vaciabm StabisEica JcaskuuuxaucuuuAk AAAAA 4AA4AAA44EAAAA A4AA4AAA44 444464444444 Rosust Analyaia eaaaanasa4 4 A4e 44444 44444446 A4A4 AA4A446AA444 A4 4464444444 Contusion Habcix 4AAAAAAEAAAAE AA AAA 4AAAAAAACAAA amp 46CAAA amp CAAAACA444 CA444444 Pattacn Racogaitian PPP PPETEPESTESETTS 4AAAAAAACAAAACAAAACAA A een 5n lact Geagn Typa 4 Jeccenceeeeecereeee 44444444 J J Plot Lindiw Raw Catal i titti PPP Dia2lay Gcan J J Plat Lindiw Standacdizrmd Ran Ostaj leaanaaanas d
77. aenAnAcanaas 4AAAAAAAEAAA amp CA amp 446A 444 A4A4CAAA44444 0 Tcmad saaaaaaeaaaaeaaaaa SLL LL amp L nn Fs Anal yaa q4 4 AAA AAA aa44444 Di22lay Geagna Fac oe e C acta Sinulit Xil heaaarasisa PPPE Statistica Ontiana Peo laf lumacm lea naa aaa casaria zaco Lowme Limit euis ER essc ss MI euaauakaa KKK eA ee bimak SHY Tw cveeceeo iva esie asd yo nnn cee Two Sided lexcacaaae sanaanaa X Axis Vacianim PIE s c esccocceccovse leceeceeeee saaanaaa Y Axia Vaciasim Tee eee a ae ee ee ee ee ee ee ee lecececace teenccce l TikIB n Robust Simultenmous C toc i Nat Jecececcce saa44444 X Axim Titis eae IPIS leceeecece saa42444 Numamciag Pogulatinn3 letececeee PPP PEETA Contour Elligam eee el Individual lecceeeece a444444 eaaaaunna aaaaaan Ecasm Dutput Film leccccccce taceeena l wWime Saeignts Gxmomcalizmd istancmm 4 METHYL WTS lecececace sa aanaa l Gmomcabm Graph With Cuccant Dot iona lesexteacee E Oiemetocy C SSCDUTSDATA Filmachanm 4 HETHYL UAT Figure 11 21 The Robust Analysis menu settings for simultaneous controlcharts Rabu GSinlrancsus Fztlulp c ol 4 ae Bom JXL lC Ueto s ra248 ia 2 22 2 F x c a 7 2 02 Zz1 7 z 24 42 Lzc 7lv cl List CZavul asneouxr lor al obrxerv ailonx Figure 11 22 Simultaneous control chart f
78. also based on similar statements with Xo s p as the choice for the critical value Md a This statement Scout User s Guide 14 13 Chapter 14 Statistical Procedures provides coverage to at least 50 of the observations Small sample correction factors are typically used to provide adequate coverage and consistency for samples from normal populations Rousseeuw and van Zomeren 1990 Let xX X x bearandom sample from a p variate population with elliptically contoured density function f x X h ac eu Et x 2 The Mahalanobis d i s t ances a r e given b y Md x W gt x w 1 1 2 n where p and 2 are the M estimators of location qm andscale X and are obtained by solving the following system of equations iteratively Y w Md a gt w Md 1 D P v w Md ag R Gn wy 7 CO w Md 1 Q The weight functions used in 1 and 2 above are based on the PROP or the HUBER influence functions and are given by equations w Md y Md Md and w Md w Md where y Md represents the influence function used The PROP influence function used here is given as follows Md Md Md Md Md Md Md Md gt Md 3 Scout User s Guide 14 14 Chapter 14 Statistical Procedures where Md is the critical value obtained from the distribution n 1 B p 2 n p 1 2 n of the distances Md Notice that no tuning constant except an value representing the area
79. amp K AA AA AA SARL A quM aLL A eaaaaaaan d Allows tha uaxmc to modit y tha colorc and shapa of PPEEYTTEXTPS saanaasal Individual ovamewations Dauoamcwab inon3 can 2m canovad e eannaaaas saanaaaa Unindma teon tha qcagna ay bucaiaqg bhan alach e leaaaasaas SARRERA 4A AAAAARA 4AAAAAAACAAAACAAKAAGCAAAAGAAAAGCAAAACAAAAC AA AAE AA AA AKA AG AA AA AA AA AA AA AA AA AA AA AA 4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAAAAC AA AA AA AA AK AA AK AA AA AA AA A AC AA AA AA AA AA AA AM 4AAAAAAACAAAACAAAAGCAAAACAAAACAAAACAAAARE AR AR AR AA AA AA AAAAGCAA AA AA AA AA AA AA AA AA 4AAAAAAACAAAACAAAAGAAAAGCAAAAGCAA AA AR AA AA AA AA AAG AA AA AA AAGAA AA AA AA AA A A amp AA AA AA 4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAA AA AA AA AA AAE AK AA AK AA AA AA AA AA AA AA AA AA AA AAA Diecmctacy C STOSE Filmahanm TEST OAT Figure 7 1 The Graphics menu with the explanation window for the Graph Parameters heading displayed 7 2 Modify Graph Colors and Shapes The first heading in the graphics pull down menu Graph Parameters allows the user to modify the color and shape of individual observations or points that will be displayed on the graphs There are six colors and six shapes to choose from yielding 36 possible combinations assuming the user has a color monitor However choosing black as the color of an observation has a special meaning Black observations will not be seen on the graphs nor will they be used in the scaling of the graphs The default col
80. aph is first displayed the three axes are scaled independently from zero to the maximum value of each variable The user can force equal scaling of all axes by pressing the lt E gt key The lt E gt key functions as a toggle turning equal scaling on and off The user can also have the graph rescaled after removing an unwanted point This feature is explained below in the section Search Observation Mode Rotating 3D Graphs The four arrow keys are used to rotate the graph The left and right arrows rotate the graph around the Z axis This is the blue axis which is always vertical on the screen The up and down arrows rotate the graph around an imaginary horizontal axis which passes through the origin The same arrow key can be repeatedly pressed to speed up the rotation in that direction The opposite arrow key can then be repeatedly pressed to slow down the rotation eventually stop it completely and then begin rotating in the opposite direction Changing from Symbols to Pixels This feature enables the user to inspect a 3 Dimensional graph with either symbols or pixels The pixel and the symbol for an observation will have the same color Two advantages of displaying pixels instead of symbols on 3 D graphs are 1 an increase in the speed of rotation in large data sets and 2 improved resolution of individual observations Disadvantages are 1 the points on the graph may be more difficult to see since a pixel is much smaller than a symbol and
81. approaches are given by the following probability statements where x and s represent the estimates classical or robust of u and o respectively a 1 100 confidence interval for population mean u P x ty 9 Jwsum2 lt p lt s tua 9 Vwsum2 1 a 11 where t represents the critical value from the Student s t distribution v a 2 b 1 a 100 simultaneous confidence interval for all x 13 2 n The test statistic max d is routinely used to identify a single outlier Let d represent the 100 critical value for the distribution of max d which can be obtained using the Bonferroni inequality The simultaneous confidence interval is given by Fmax dj lt d j 1 a which is equivalent to the following probability statement Scout User s Guide 14 29 Chapter 14 Statistical Procedures P x s d lt x lt x s d d1 2 nm slo 12 This interval is equipped with a built in outlier detection procedure An observation outside of this interval is an obvious outlier and may require further investigation c 1 a 100 confidence limits for the individual observations x from a population with unknown mean and sd are given by the following statement P x sdsxs x sd 1a i1 2 n 13 where d is the 100 critical value of the distribution of the robustified distances d Singh et al 1994 used this interval to resolve a mixture sample into its component populations The Student
82. ated The user can view the scatter plot of the currently displayed variables by pressing the ENTER key When viewing a scatter plot the user can scroll through the observations that make up the graph Again the purple box will highlight the location of the current observation being displayed The axes are scaled independently from the minimum value to the maximum value of the variable The user can force equal scaling of both axes by pressing the E key The E key functions as a toggle turning equal scaling on and off The ENTER key returns the user to the correlation matrix and the lt ESC gt key exits the graphics mode returning the user to the menu screen 7 5 Zoom Feature This option enables the user to inspect portions of a 2 dimensional scatterplot in more detail This is especially useful when many data occur over a relatively small range making Scout User s Guide 7 3 Chapter 7 Graphics resolution of individual observations difficult To use the zoom feature on a 2 dimensional scatterplot press the lt Z gt key A white rectangle encompassing all of the observations will appear Use the minus key to decrease or the key to increase the area of the rectangle Use the XARROWS keys to move the rectangle to the portion of the scatter plot that you wish to enlarge When you have surrounded the observations of interest with the white rectangle press the ENTER key Scout will automatically rescale the
83. ates are summarized below Also Figs 5 and 6 are the classical and the PROP Scout User s Guide 14 22 Chapter 14 Figure i 31 91 Generalized Distances asx Maximum cLargezi Statistical Procedures The Classical 0 0 Plot of the as Warning Individual HDO 10 85 5 27 Quantiles Figure 2 The Robust 270 89 Generalized Distances Scout User s Guide t 10 73 Bete Oistributians CPROPS O O Plot of the Mds me Asay ire EP as Maximum Largest HDO 12 51 asz Warning rindividual HD 10 85 5 37 10 73 Quantiles Beta Distribution 14 23 Chapter 14 Statistical Procedures Figure 3 Scatter Plot of the Robust PCs 6 4 Principal Component 1 t t 0 89 2 31 Principal Component 2 Figure 4 Scatter Plot of the Robust PCs Principal Component 4 t t 0 88 0 24 Principal Component 5 Scout User s Guide 14 24 Chapter 14 Statistical Procedures gure amp The Plot of Classical Mas Without the 8 Outliers 250 464 T Generalized Distances asx Maximum tLargezr HD 153 11 SSX Warning tIndividusl MO i i 1 5 28 10 51 15 77 21 02 Quantiles Beta Dietributions Plot of Robust Mds Lithout the 8 Outliers Generalized Distances t t E 26 10 51 Quantiles Beta Liztributiaon Scout User s Guide 14 25 Chapter 14 Statistical Procedures 0 05 Q Q plots of the Mds with location and
84. ation Scout will then display the selected matrix on the screen Scout User s Guide 6 1 Chapter 6 PCA If the entire matrix does not fit on the screen then the user can press the arrow keys to scroll through the matrix Press lt ESC gt to return to the PCA menu after viewing the matrix 6 3 Eigenvalues This heading allows the user to view the eigenvalues Scout will ask the user whether to calculate the eigenvalues using the covariance or correlation matrix After making this choice and pressing the lt ENTER gt key the eigenvalues are displayed along with their differences proportions and cumulative proportions If there are more eigenvalues than will fit in the window then use the UP ARROW and DOWN ARROWS keys to scroll through them Press the lt P gt key to send this information either to the printer or to a file Press the lt ESC gt key to close the window and return to the menu 6 4 View Components This heading displays a listing of the component loadings Scout will offer the user the choice of performing PCA with either the covariance or correlation matrix After making this choice and pressing the ENTER key use the UP ARROW and DOWN ARROWS keys to scroll through the information Use the P key to send the information to the printer or to a file Press the lt ESC gt key to close the window and return to the menu 6 5 Transform Data The component scores are the product of the eigenvectors and the sta
85. ations such as page orientation scale position and port When this feature is selected a screen will appear with the following headings Choose Printer Page Orientation Use Shading Patterns Horizontal Scaling Percentage Vertical Scaling Percentage X Starting Location Y Starting Location Formfeed After Print Specify Printer Port Choose Printer To select a printer highlight Choose a printer from the screen that appears as described above Press lt ENTER gt and a screen will appear alphabetically listing various types of printers Find the printer you wish to use by using the lt ARROW gt PAGE UP PAGE DOWN lt HOME gt or END keys Press the ENTER key when your printer is highlighted Page Orientation The user has a choice of Landscape or Portrait mode for printing graphs Landscape is the default and is usually the better choice for most graphs To change your selection highlight Page orientation as described above Press ENTER to change from Landscape to Portrait Press ENTER again to change back to Landscape Use Shading Patterns This option allows the user to replace the color in the graphs with shading patterns The choices are Yes and the default No Select Use Shading Patterns as described above Press ENTER to change the use of shading patterns to Yes Horizontal and Vertical Scaling Percentage These headings enable the user to adjust the horizontal w
86. atistics Using to Classical Initial Estimate to Classical and setting the Right Cutoff Tail to any number between 0 001 and 0 8 Fi Data Clerical Method Eoburt Method POA Graphie Stem coe MILLIA Mulbivaciabm Kuctoaia Laigne o i Oiacocdant Dasacwat ions Kuctosia 42 25 49 4 of thm S vaciaaims naca Used in bhia bwat O ot thw 50 ohamewaetb ions acm miasing of tha 0 o332 cvabin003 acm diacocdant Attac canowing bm diaxcacdant osamewationtsal Tha twat statistic im 24 55 with a P Valum of o if Pcmas CP to a2cinb oc CRESTS to wxikt A SAAAAAAAEAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAKAAEAAAAEAA A AE AAA AE AAA AE AAA AE AA A AE AA AAA icmctacy C SSCDUTSDATA Fi lanana IRIS OAT Figure 10 2 The results of Multivariate Kurtosis with 0 1 on the IRIS DAT file One outlier was detected and is identified here The Select Variables heading is a common option for three of the level 1 menu headings Classical Method Robust Method and PCA In each instance the Select variables option functions in the same way through the use of plus and minus signs users can indicate which variables they want included in and which variables left out of the analysis In the above example we didn t use Select Variables with this particular file by default all variables except Count are selected resulting in the 4 out of 5 variables used statement in Figure 10 2 The headings for Genera
87. ax Mds is Scout User s Guide 14 17 Chapter 14 Statistical Procedures described in the following Section 14 6 Q Q Plot of Mahalanobis Distances Using Beta Distribution Compute the Mds Md x p gt x p for 1 2 12s where w and 57 are M estimates classical or robust obtained appropriately using one of the four procedures available in Scout Order the distances Md Mdg lt Mdo lt lt Md Compute the expected quantiles b using the beta or a chi square distribution For i example the beta quantiles are given by the following equation buy Tap a 1 EE meee ES ip l a F B xe 1 x dx ia na p 1 6 where a 1 2a B b1 2b a p 2 and b r p 1 2 Compute the theoretical quantiles c TE from the distribution of the Mds using c ay 7 21 b n Finally plot the pairs Cp Md i 1 2 n A Q Q plot using the chi square approximation can be obtained similarly For multinormal data this plot resembles a straight line A formal test statistic Rp n gt and its critical values to assess multinormality are given by Singh 1993 On this graphical display of multivariate data points well separated from the main point cloud represent potential outliers Formal Graphical Identification of Outliers Construct the Q Q plot of the robustified Mds as described above If assessment of Scout User s Guide 14 18 Chapter 14 Statistical Procedures multi
88. ble options in the Display Graphs For menu The values for the X Axis and Y Axis Variables can also be typed in manually after using the ENTER key when the cursor is on the X Axis Variable or the Y Axis Variable as appropriate In the same manner the titles can be typed in after using the ENTER key when the cursor is at title heading Use the down ARROW key to move the cursor to the last entry Generate Graph With Current Options Use the ENTER key to generate the graph The Weights and the Generalized distances can be viewed by moving the cursor to the View Weights and Generalized Distances and by using the ENTER key Scout User s Guide 5 0 Chapter 5 Robust Statistical Methods 5 5 Confusion Matrix This option performs linear and quadratic discriminant analysis and expects the data to be multivariate in nature The first column of the data set should have the population ID a number between 1 and 20 and the number of variables should be at least two 2 Graphs cannot be produced with this option When the Confusion or error Matrix heading is selected the second window will display the message Robust supervised pattern recognition classification Press the ENTER key to display the third window to set various options The available headings for this choice are as follows Heading Example Choices Discriminant Method 630 fs se os hehe ne Lew oe Bee ATO ESAE RS ORE XE Linear Statistics Opuons 2i VE
89. bol at the end of the name If the user is not in the root directory then the first item in the menu will always be V indicating the parent directory Choosing this item allows the user to change to the parent directory of the current directory If the desired directory is not found on the current disk drive then the user may select a new disk drive to search To change drives simply press the letter of the new drive If the letter pressed is a valid drive from A to N then that drive will become the current drive When the user has found the desired drive and directory a data file can then be chosen Use the arrow keys to highlight the desired data file and then press ENTER to select it Sometimes there are too many file names to physically fit in the window If the desired data file in not displayed then scroll through the file names by pressing and holding the down arrow key Scout has the ability to search for any file name including the use of wildcards The current search string is printed at the top of the window This string can be changed by pressing S and then entering a new string It is important to remember that data files saved using the Save Scout File option have the SCT extension assigned by Scout automatically while ASCII data files may have any extension Scout User s Guide 2 4 Chapter 2 Scout File Format 2 4 Save Scout File This option saves a Scout file in binary format which is inten
90. btained using the procedure described earlier Q Q probability plots of the principal components are sometimes used to reveal suspect observations and also to provide checks on the normality assumption Scatter plots of the first few high variance PCs reveal outliers which may inappropriately inflate variances and covariances Plots of the last few low variance PCs typically identify observations that violate the correlation structure imposed by the main stream of data but that are not necessarily discordant with respect to any of the individual variables An example is discussed next to illustrate these procedures Example The data set of size 82 with five variables including the octane readings y of gasoline and four explanatory variables was first considered by Daniel and Wood 1980 Atkinson 1994 used forward searches and stalactite plots to identify multiple outliers in this data set which becomes quite overwhelming for the typical user Figure is the Q Q plot of the Mds obtained using the MLEs Figure 2 is the corresponding graph obtained using the PROP function 0 05 This graph correctly identified the 8 outliers in a single execution From this graph it is also clear that observations 66 and 82 represent the border line cases This is illustrated by the scatter plots of some of the robust PCs as given in Figures 3 and 4 respectively For confirmation the outlying observations 44 and 71 77 were deleted and the recomputed estim
91. c in a n v MH n pa i T c T o T t 4 37 a 74 Quantiles Betas Liztributiaon Figure 11 29 Generalized distance Q Q plot using Huber influence devins sergam FO me mrersiived ls n7 27 2 7 Leertiiex Usta Ur erties aticr Figure 11 30 Generalized distance Q O plot using Prop influence Scout Toturial 11 26 Chapter 11 Tutorial Ill 11 7 Kurtosis To calculate the Kurtosis we will also use the IRIS DAT data set Still in Robust Method and Robust Analysis enter the Select Graph Type menu select Multivariate Kurtosis press ENTER Press END or move to the bottom of the menu if you don t have an END key and then press ENTER again In this instance Generate Graph With Current Options will initiate the calculation of kurtosis When complete the output should match Figure 11 31 Film Data Classical Nathod Rosuat Nathod PCA Geagnica Syatan 4AAAAAAACAAAACAAAACAAAACAA A AG A A AA A A AM Vs AAAACAAAAEAAAAAA 4A4AAA ALAAARCEAA A AGAR AAE AAA 46A 4 44444 Sulmct Vaciaaima 4AAAAARACAAAACAARAAAK 4A4A4A4AEAAAACLAA446A4AA0AAA4 464444444 Uniwaciabm Statistics lecceteeeeeccccneee 44444A444644446AA44A46A4 A46 A444 4444444 Rosust Analysis leanaanaekaan4enA4AA 444A4A4A4AEAAA4 AAA4AEAAA4AEA A4 464 A44 A444 Contusian Nateix 4AAAAAACAAAACAAKAAK 4A44A4A44A4 A4AA46 AAAA4 A amp AA4 AA444 4444444 Pattacn Racogoitian eanaanaeanAneaanAa 4A44AAAACAAA4 amp 46A
92. cagt Nan 5m ttinga lecccccace 444 4 444 MM MM amp 4 amp 4 amp 4 4 Dicmckoacy C SSCDUT9 TNSDATA Firlmahanmm IRIS OAT Figure 11 6 Statistical options for the generation of Q Q plots of principal component analysis PC A Still in Robust Analysis move to Display Graphs For select Q Q Plot PCA and press ENTER Check to ensure the remaining options in the Robust Analysis window match those in Figure 11 7 If necessary use the same techniques as those explained in the last paragraph to make them match finishing this time with Genrate Graph With Current Options Scout Toturial 11 6 Chapter 11 Tutorial Ill Film Data Clanzaical Hathaod Robust Natnod PCA Gcagnica Systan 4AAAAAAAEAAAAEAAAAE AA AAE AA AA A A AA LA a nd CEL AKACARRALARRAR KY 4AA44AA446AA4AUAAAALAAA4EA amp A442 4444444 Sulact Vaciasima 4AA4AAAACAAAACAAAAAR 4AAAAAA ACA AR AERA AAAEAK ARE AA AA6AA 44444 Uniwaciakm Statistica PRPIEER AAPRSBPKRYPhASPBSPT 44AA44A46AAA4 EAAAACA 4446444 4 AA4A44444 Ruaust Analyaia eanaanneannAeaaaaA 4444 AAA AEAAAAEAA ARE AA A46 AA 44 A4 44444 Contusion Nabeix SAAAAAAE AAAAEAAAAA 4A444A4446A44A A4AA4 AAA446AA44464444444 Pattacn Rxcagaitina PPEPPTTTPTP PPP 44AA4AA4AACAAA amp CAAAACAAAACA444 A4A4444 0 Tcmad IeaaaanaeaaaAneaaaaA 4A A AA A MM Fa Ana lys 13 a T eaaaaa44 Dia2lay Geagna Fac Oedacad PCA leaaaannaa eananaaa Statistica Options Classical jzcsaaaananznszs PEETA
93. ce of Observation or Variable Select Variable either by highlighting Variable with an ARROW key and then pressing ENTER or by pressing the V key Scout will automatically insert a column and name the variable Variable n where n is the number of the new variable Each observation of this inserted variable is automatically assigned the value of 1E31 To enter the desired name units and other information about the inserted variable see Editing Attributes of Variables If the values of the inserted variable can be calculated with a formula involving any of the other variables see Formulas Otherwise the desired values must be hand entered Simply move about the screen with the lt ARROW gt keys until you find the observation you wish to change type the correct value and press ENTER Repeat this procedure until each observation has the proper value Formulas It is often useful to analyze variables that are functions of one or more variables in the data set Consider for example a Scout data set in which there are 4 variables V1 through V4 It may be of interest to analyze the results of a fifth variable V5 Suppose that V5 V3 Log V1 1 V2 Scout enables the user to overwrite the values for a variable with values which can be calculated by a formula involving one or more of the remaining variables in the data set This is especially useful if the variable that you wish to overwrite is one that has just been inserted S
94. cores 5 identify univariate or multivariate outliers Q Q plots of generalized distances 6 perform principal component linear and quadratic discriminant analyses 7 compute and plot various statistical intervals including confidence interval for mean prediction interval and simultaneous confidence interval Scout reads ASCII data files in a specific format which is discussed later in this manual Files created in other software such as WordPerfect are not recognized by Scout unless they are in strict ASCII format Scout can handle up to 22 variables with the number of observations limited only by the available memory of the microcomputer Scout can save data in a binary format In this way Scout can retain graph symbols and colors and outlier information in addition to the 22 variables Spreadsheet data files can easily be converted into Scout data files as discussed in section 2 2 Scout allows the user to view and edit a data set Editing is limited to the existing variables and observations Variable fields that can be edited are name units format and the comment Observation fields that can be edited are the label and values for the variables Scout is compatible with 8086 80286 80386 and 80486 based microcomputers with at least 512K of RAM and an EGA VGA or Hercules graphics system A fixed disk drive is highly recommended as Scout performs many transfers between memory and disk during execution Scout also uses expande
95. cout will prompt the user to enter a file name The user may specify an extension here that will be used If the file name exists Scout will ask the user if the old file should be written over Scout User s Guide 2 1 Chapter 2 Scout File Format The file heading in Scout contains six headings and choices as displayed in Figure 2 1 below These can be used to read write load save merge and append various data sets Film Data Clanaical Nathod RFaoaaust Hathaod PCA Gcagnica yatan aE AAEAAAAEAAAAEAAAACAAAACAAAACAAAAG AA AA AA AA AA A AK amp AK AA AA AAA Raad ASCII Film kannnaaeaa 44e 444 ACAAAAEAAAAEA A AA amp CAAAACAAAAEAA ARCAR A ACAA AA AAA Wcibzm ASCII Film eanaaaaae AA AAkeA4AAkEARAACAAAACAA AR amp EAA AA EAAAACAA A AAA A A amp EAR AA AAA Load Scout Film eaannnaAeAAAAeARAAEAAAAEAAAACAAAACAA AACAA A amp E AA AAEAA AREAAAA AAA Sewm Scout Film eaannnaeaAAAEARAAEAAAAEAAAACAAAACAAAACAA ARE AR A amp EAA ARCAA AA AAA Mmscqm Two Filma eaannnaeaAA Ae AR AAEAAAAEAAARCAAAACAA AK CARA A amp CAR AREAA A amp EAA AA AAA Aggand Tao Films anna Pee ree eT re rere rrr rT rrrrryrrrirrrerrrrrrrrrrrrrrrirrrryr rs 4AAEAAAAEAAAACAAAACAAAACAAAAGCAAAAGAAAAG AA ARE AAA AK AK AA AA AAA A 4AAAAAAACAAAACAAAAGCAAAAGAAAAG AA AAG AA AA AA AA E AA AA AA AA AA AACAA AA AA AA AAA AG AA AAA A 4AAAAAAACAAAACAAAAGCAAAAGAAAAGAAKAAK AK AA AR AA AA AA AA AA AA AA AA AAGAAA AG AA AA AA AA AK SAAAAAAAGCAAAAEAAAAK AR AAE AA AA AA AA AA AACAAAAGAAAAG AA AA
96. d memory if found on the system in two ways First the slow transfers between memory and disk mentioned earlier will be replaced by very fast transfers between memory and expanded memory needs 128K Second Scout will use up to 64K of expanded memory for additional data storage A color monitor will greatly enhance Scout s text windows and graphics A 20 MHz 80386 with a math coprocessor and a fixed disk is the minimum system recommended for Scout operation By selecting the System heading in the main menu and then selecting Information a user can display the system Scout User s Guide 1 1 Chapter 1 Preliminaries specification Scout was written by combining several subroutines and programs written for various research projects conducted by Lockheed Environmental Systems amp Technologies Company in service of the United States Environmental Protection Agency EPA Thus Scout is in the public domain is not copyrighted and no license agreement is necessary However users should be cautious of the source of their copy of Scout Due to computer viruses it is best to obtain Scout directly from Lockheed or the EPA 1 2 Manual Organization The user s manual for Scout is organized into three sections Section I chapters 1 to 8 is the User s guide section II chapters 9 to 13 includes tutorials and section III chapter 14 provides technical notes with examples for statistically oriented users Users not familiar with Scout will
97. d the use of confirmatory analysis This is the reason that 1 the MVE based procedures are used only for the identification of outliers since the Scout User s Guide 14 16 Chapter 14 Statistical Procedures MVE robust estimates differ significantly from the corresponding classical estimates after the removal of outliers 2 the use of a small sample correction factor is recommended and 3 it has been suggested not to use the approximate chi square values too rigorously to define large distances 14 5 Normal Probability Q Q Plots of the Original Data and of Principal Components In the following data denoted by y Yz y represent raw standardized values of a variable in the data set or scores on one of the principal components The normal probability plot for these data can be obtained as follows Arrange the data or PC scores in ascending order of magnitude Ya lt Yo Yin Compute the normal quantiles q using the following statement VP 1 k 3 8 2 lt a Hz lt qwl ua 50e e 2 dens 174 amp oco4d 1 2 ks 5 Plot the pairs Qus Yin z k 1 2 7 H If the data are from a normal population then these pairs will be approximately linearly related Systematic departures from linearity and curved patterns suggest departures from normality Outlying observations are well separated from the majority of the data The Q Q plot of Mahalanobis distances Mds and an outlier test based on the M
98. data These discordant observations or outliers are highly unusual when compared to the rest of the data For a more thorough description of outliers and their significance see the introduction to Chapter 14 The Classical Method menu has two tests for discordancy Mardia s multivariate kurtosis and the Mahalanobis generalized distance Mardia s multivariate kurtosis is also a useful test for assessing multinormality and is recommended when the number of outliers is unknown but potentially substantial The generalized distance is strictly an outlier test and is recommended when the number of potential outliers is known to be very few Both tests assume the data represent a random sample from a univariate multivariate normal population Both of these tests are included in the menu shown in Figure 4 1 below Film Data Clazaza2 ical Hamthod Rosust Natanod PCA Gcagnica Systan 4AAAAAAAEAAA AE AA MA 4A amp AAAACAAAACAAAACAAAACAAAAC AAA AE AA AAAA ana aa4a4tA444a44 l Zm imct Vaciaaima catt kA R4 AKA AGARAAEAA4AEAAAAEAAAACA AK AAA a4aa44d 444A444 l Gmomcalizmd iastancm enaak Aue A4 A AE4E444EA4A 42AA4 amp A4 amp A44ALA4AA4A sanaanaaacaaaaaas Mulbiwaciakbm Kuctosia enuanAA4tcaa4AeARAAEAR4AEAAAACAAAAEAAAAAA 4aa44Aataaaaaaa l Causal Vaciaaima 4AAAAAAKEAAAACAAAACAAAACAAAACAAKAAEAAAAAA 4AAAAAAAEAAAAAAA Associated Causs3 SAAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAAAAA sanaanaacaanaaaaa Ranowm Outlime Flags ennukA4xtaaaAeA4AaEAR4AAEAA
99. ded to be used only by Scout Generally other software cannot read this format This format has the advantage of retaining the graphics color and shape specified for each observation and the outlier status of each observation To save data in this format simply select Save Scout Data File from the pull down menu and enter a file name Do not include an extension with the file name as Scout will always use the SCT extension Also do not precede the file name with a path New data files are always written to the current drive and directory displayed at the bottom of the screen 2 5 Merge Two Files This utility allows the user to combine two data files into a new data file The user first selects whether to merge two ASCII files or two Scout files together If the merge is successful the new data file will always be written as an ASCII file The merge routine assumes the variables are different in each of the input files Therefore the output file will contain all of the variables from both input files even if they have the same names The routine does however account for common observations Two observations taken from each of the input files that have the same label or name will be merged into a single observation in the output file 2 6 Append Two Files This utility also allows the user to combine two data files into a new data file but in a different way than merge allowed The user is given the option to append two ASCII files or two
100. default scale value is 10 The larger values shrink the graph Scout Toturial 11 13 Chapter 11 Tutorial Ill Pattern Recognition k o u I 2 a n 11 PO Score 2 Figure 11 14 Rescaled graph Change the values in the Pattern Recognition menu to those shown in Figure 11 15 then move to Begin Computations with Current Options then press lt ENTER gt Scout Toturial 11 14 Chapter 11 Tutorial Ill Film Data Clazazical Hmthod Robust WNathod PCA Geagnics Syatan 4AAAAAAACAAAACAAAAEAAA AG AA AA C AA A AG A R E A j s6AAAASAAAAEAAAARAAR 4AAAAAAAREARAAEARAREAAA AE AR A4 EAA44AAA Zm imckt Vaciaasima 4AAAAARACAAAACAAAARA 444444 4AAEAAAACA amp A4EAAAA4CAA4AA AA4AAA444 Uniwaciabm Statistica lecceeeceeeeceeeeed 44444A4A A4AKCA4AA4CAAAA4 AA4A4AC44AA4444 Rosust Analysis 4AA4AAAACAAAACAAAAA 444444 AAEAAAACAAA amp ACAAA4CAA4A4A4C4A44A4A444 Contusion Nateix 4AAAAARACAAAACAAAAA qAAARAAALARAAEA AA AEAAA ACA A A4 EA A4 A244 Pattacn Excagaitina enaenaaaennaneekaAaa CRAAAAAAER RARER 446A 444 4444 4444444 0 Tcmad eaanaaneaaaaeaaaan 4AAAAAAA4EA4A4tA 44464 444 4444 4444444 Add Naana 4AAAAAACAAAAEAAAAAR 4A4A4AAAA A MM Pakkmzca Racogaibinn nn eee KKK casariis Sbatisbica Dabinas Classical lesccaeaae san4a as Hunamecing Pogqulabians a aaxaznza casariis Cantauc Elligas lad vv idual lescceeaae 4AAAARARA 4AAAARAARAR e aaaaana l Typa of Geagn 4A4AAAAAK saaaanaa
101. dentified as discordant the user should be cautious that the problem may arise from a lack of multinormality or the presence Scout User s Guide 4 2 Chapter 4 Classical Methods for Outlier Identification of multipopulations Each observation identified as discordant is flagged as such and the graphics elements for those points are set to downward pointing red triangles The discordant observations can then be viewed in the graphics module Scout does not remove the discordant observations unless the user desires to do so During outlier testing a new data set is generated The user must decide how Scout should handle the outliers when writing the new ASCII file Four options available to the user are Remove Keep Flag and Query The Remove option deletes all of the outliers from the generated file The Keep option saves all outliers and the Flag option numerically flags the outliers in the new file It does this by adding a new variable called OUTLIERS to the end of the variable list The values in each observation for this new variable will be either a 0 or a l where a l indicates this observation is an outlier The Query option allows the user to individually specify which outlier observations will be written to the new file These features are available only in the Classical Method menu CAUTION Scout only identifies outliers for the variables selected When viewing 2 D or 3 D scatter plots which flag outliers make
102. dings and corresponding choices in the third window Robust Analysis are as follows Heading Zero Lower Limit Limit Style X Axis Variable Y Axis Variable Scout User s Guide 5 8 Choices Yes No Upper Limit Lower Limit Two Sided An positive integer between 1 and 22 An positive integer between 1 and 22 Chapter 5 Robust Statistical Methods Title Title of the Graph X Axis Title Title of the X Axis Numbering Observations Populations Contour Ellipse Individual Simultaneous Indiv amp Simul Indiv Class Simul Class Erase Output file See text The Erase Output File feature may be important if a given file is used repeatedly Each time output is generated for a given file it is appended to a file with the same name but a different extension URS This appending of output means that the current output will be appended to any previously generated output from any previous work with this file The user has the option to erase this file prior to the recording of the current session s output in this manner the output file will be reflective of only the current session The values for the X Axis and Y Axis Variables are chosen by Scout automatically from among the selected variables While in the graphics mode the user can also use the Page Up and Page Down keys to change the X labels and the Ctrl Page UP and Ctrl Page Down to change the Y labels New graphs appear after each selection The F1 key can be used to see all availa
103. directory on drive C To run Scout enter the following commands 1 Type CD SCOUT and press ENTER This changes the current directory to the SCOUT directory 2 Type SCOUT and press ENTER This starts the Scout program If you have any problems with the operation of Scout please write to Scout c o John Nocerino or George Flatman Characterization and Research Division National Exposure Research Laboratory USEPA P O Box 93478 Las Vegas NV 89193 3478 Scout User s Guide 1 3 Chapter 1 Preliminaries 1 4 Viewing the User s Guide Scout contains an on line User s Guide When users are in any mode of Scout they can reach the on line User s Guide for that mode by pressing the F1 key When a section of text is displayed in the large window covering the lower portion of the screen users can move through the text using the following key commands HOME Moves to the beginning of the text END Moves to the end of the text UP ARROW Scrolls the text up towards the beginning DOWN ARROW Scrolls the text down toward the end PAGE UP Scrolls the text up toward the beginning by a page PAGE DOWN Scrolls the text down toward the end by a page ESC ENTER Closes the viewing window Scout User s Guide 1 4 Chapter 2 Scout File Format 2 1 File Management Scout reads ASCII data files in the following format The first line of the data file is a comment line presumably to describe the origin or title
104. e ENTER key to highlight the corresponding heading this applies to Right Tail Cutoff and Trimming Percent in the previous menu The other choices can be set by using the lt ENTER gt key repeatedly After all selections are made move the cursor to the bottom of the third window to indicate Generate Statistics Using Current Options Use the ENTER key to generate the Univariate Statistics corresponding to the selected choices At this point the result of the univariate statistical analysis will be displayed on the screen These statistics are also stored in an output file of the same name with the extension URS For example statistics for IRIS DAT will be stored in IRIS URS The statistics get appended to this file if any information from an earlier Scout session is still in the file then the current statistics will be added to it Scout User s Guide 5 4 Chapter 5 Robust Statistical Methods Scout User s Guide 5 5 Chapter 5 Robust Statistical Methods 5 4 Robust Analysis When Robust Analysis is selected the explanation window will display the message This routine provides exploratory as well as confirmatory procedures for the assessment of multinormality and detection of multivariate outliers When ENTER is pressed while Robust Analysis is highlighted a third menu appears listing various options The available headings and choices of this menu and the default choices are as follows Headings Default Choices
105. e mixture of several populations with varying degrees of contamination In this situation the objective will be to decompose the mixture sample into the component populations Experimentalists especially environmental scientists dealing with large amounts of data often need to identify their experimental results that are significantly different from the rest of the data In data sets of large dimensionality it becomes tedious to identify these anomalies Appropriate multivariate procedures need be used to identify multivariate multiple anomalies some of which are incorporated in Scout The successful identification of outliers depends on the statistical procedures employed Most of the outlier identification procedures are based on the Mahalanobis distances Mds The maximum distance Max Mds is a well documented test statistic e g see Wilks 1963 Devlin et al 1981 for the identification of a single outlier Observations with Mds greater than the 100 critical value of the Max Mds are considered as potential outliers Singh 1993 using the first order Bonferroni inequality and incomplete beta distribution computed the critical values of Max Mds for any combination of n and p and showed that these values are in close agreement with the available simulated values as given Scout User s Guide 14 2 Chapter 14 Statistical Procedures in Jennings and Young 1988 and Stapanian et al 1991 Computation of the critical values of the
106. e variable fails these tests you may then try various transforms on the selected variables Each time a transformation is tried the resulting variable is retested for normality You may select one or more transformations for each variable by selecting a suitable function as displayed in the figure 3 2 An undo feature allows you to sequentially undo each transform Scout User s Guide 3 6 Chapter 3 Managing Data in Scout Film Data Claaa cal Nathod Rosust Haxthod PCA Gcagnica Syatan HE eg ERA K KEK KAKA AAA PO T anat acmab ino Nanu 10 aaaaanaeaaaaeanaaneaaaa l Naan So ea 444444444 A44444444 Liomac X I al 5 aaanaaeauAnacaaaaeanas l Logecitan TF CARRRRRERRARERARRERRRR Ponar Tr A444A46 44A44 4444e4444 Box Caox IX a Iza ta Jeccceccececcenaccenacel Accaiom accaia x ree A4A4AA4 4444 A444 4444 Undo Last Teanatocn 4 44 44 4 lanaaanenaaaneaaaneaaaa EEST whan tinianad 4 4 4A 444 4 aaaananenaaacaaAAkeAn4An4 C44446 AAAACAAAA6AAAACAAAA Oe Re VER WARS BBs EE xaanaaanenaaaeananAehnnAnAeAaA4AbEAnAAACAAAAEAAAAEAAAA AAA AA AA TKK ta SAARAARAAREARAAAREARAAAEARAAAEARAAAERAAAAERAAARAEAAAAEAAARK ACA AREA AA REA AR ACA AA ALA amp AREAR AREA AR ACA AR AK A cim c3 00 Qeacling Nocnality Twat lalgnam D U0S A Teana vVaciaaim a Naan StdOaw Zkmzuomas Twat Stat Cert Wal SS EE SSS count so 1 0 o o IESI IESI 122 langta so
107. ee Inserting Variables Here you would be changing the inserted values from 1E31 to a formula involving one or more of the other variables Highlight the variable that you wish to overwrite with a formula by moving about the spreadsheet screen until you arrive at the column corresponding to the variable Next press the ALT and the F keys together You will be asked Replace Variable name with a formula are you sure Press the Y key for Yes the default is No You will then be asked to enter the formula Carefully enter the formula Scout User s Guide 3 3 Chapter 3 Managing Data in Scout Scout User s Guide 3 4 Chapter 3 Managing Data in Scout 3 2 Scout functions and operations Scout recognizes the following operators and functions addition subtraction or opposite sign multiplication division x y x raised to the power of y Abs x absolute value of x Atan x arctangent of x Cos x cosine of x Exp x exponential e g the value of e raised to power of x Ln x natural logarithm Int x integer function e g Int 7 99 7 Int 2 000 2 Log x logarithm base 10 Round x rounding function e g 7 99 becomes 8 Sin x sine of x Sqr x x raised to the power of 2 Sqrt x square root of x When you are sure that the formula is correct press lt ENTER gt Scout will automatically do the calculations and return you to the spreadsheet Editing Attributes of Variables This feature allows
108. en above and can be obtained using one of the four procedures three robust and one classical available in Scout The critical values of kurtosis are given in a simulation study performed by Stapanian et al 1991 The classical module of Scout includes a sequential outlier detection procedure based on multivariate kurtosis and these critical values The robust procedures based on Campbell s 1980 influence function and HUBER function as given in Devlin et al 1981 often leave some influence of outliers on robust estimates The weights associated with the HUBER influence function are given by w Mdj 1 if Md lt k and w Md k Md otherwise where ka is the 100 critical value associated with the Mds obtained using either a scaled beta or a chi square distribution For details of the HUBER influence function and the MVT procedures in Scout the interested reader is referred to Devlin et al 1981 and Singh 1993 It is observed that the outliers have negligible influence on the estimates and Mds obtained using the PROP function The PROP estimates and Mds with or without outliers and the corresponding classical MLEs and Mds based only upon the inlying observations obtained after the removal of outliers are also in close agreement This confirms that the identified flagged observations indeed are all of the outliers present in the data set In order to verify that the identified outliers are indeed the outliers Fung 1993 suggeste
109. ence function HUBER 1981 Devlin et al 1981 based on Mds C Multivariate Trimming MVT Devlin et al 1981 based on Mds d PROP influence function Singh 1993 based on Mds Also numerous graphical displays are available in Scout These include the histogram normal probability Q Q plots of raw data scatter plots of raw data and contour plots Q Q plots and scatter plots of principal components Q Q plot and index plot of the Mds scatter plots of discriminant scores plots of prediction interval simultaneous confidence intervals contour plots and some 3 D graphics 5 Principal Component Analysis PCA A separate PCA option is available in Scout to compute the classical dispersion and correlation matrices eigenvalues eigenvectors loadings and principal component scores 6 Performs the linear and quadratic discriminant analysis Confusion Matrix Scout User s Guide 14 7 Chapter 14 Statistical Procedures The pattern recognition option can be used to 1 obtain scatter plots of raw data 2 graph of the PCs and 3 compute and graph the raw discriminant scores The corresponding contour ellipses 5 choices are available can also be produced on these scatter plots by pressing the E e key For details see Johnson and Wichern 1988 Anderson 1984 Jg D Trend and Add Means options These two procedures are used in geostatistical applications especially when the spatial data need to be detrended so that the consta
110. enu The level 2 menu for the File heading and the explanation window for Read ASCII File are also both displayed Scout is a statistical software package with several features Navigating through the multiple levels of Scout requires a standard nomenclature that can be easily followed The following is an explanation of the nomenclature we will use in describing Scout in these tutorials Menu A set of choices or headings Headings Those selections that will present further menus lists of choices and or headings Choices Those selections that will set a given parameter or perform a specific function Explanation window A box appearing at any level containing either an explanation of the selected heading or instructions for the performance of a Scout Scout Tutorial 9 1 Chapter 9 Tutorial function Level 1 menu This refers to the set of headings displayed in the first window seen upon entering Scout File Data Classical Method Robust Method PCA Graphics and System Level headings File Data Classical Method Robust Method PCA Graphics or System as shown in Figure 9 1 above Level 2 menu This refers to any of the seven menus displayed after selection of a Level heading Level 2 headings and choices Read ASCII File Write ASCII File Load Scout File Save Scout File Merge Two Files and Append Two Files as shown in Figure 9 1 or any set of headings and or choices resulting from selection of
111. er plots or XY plots 3 Dimensional graphics are used to display three variable plots which can be rotated to illustrate the extra dimension The Graphics menu is displayed in Figure 7 1 below Film Data Classical Nathod Rosust Nataod PCA Geagnica Systan CAKKKKAKERKKKERAERRERARRERRREERRAEREREAEAEREERERERRERRR EERE A A Ly 4AAAAAAACAAAACAAAACAAAACAA AA EAR A amp CAAA amp CAA 446A 444 4444 444444 Geagn Pacanmitmca 4AAAAAAACAAAAEAAAACAA A amp KEAAAA CAR AKCAA A amp 6EARAA ACA A amp amp 46 44 A446 444444 2 Oinanaranal 4AAAAAAACAAAACAAAACAA AR CAR AREAA ARCAR A amp KCA 4 4 amp 8 A4 4464 444 444444 5 Oinanaional LLLA SAAARAAAAEARAAEAARAAEARAARERAAAERARAAEAARAAEAAAAEAAAAEAARAAEAAAAEAAARAERAAAAERAAAAEAAAAAA SAAARAARAAEARAAREAARARERAAAARERAAAAERAAAAEAAAAEAAAAEAARAAEAARAAEAAAAREAK 4AAAAAAAEAAAACAAAAGCAAAAECAAAACAAAACAAAAE AA AA AK AA AA AA AAAAEC AA AA AA AA AA A AK AA AA AA 4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAACAAAAE AA AA AA AA AK AA AA AA AA AA AA AA AA AA AAA AAA 4AAAAAAACAAAACAAAAGAAAAGCAAAAGCAAAAC AA AA AA AA AK AA AK AAG AA AAKGAA AA AA AA AA A AE AA AAA 4AAAAAAACAAAACAKAAGCAAAAGAAAAGCAA AA AA AAE AAA AE AA AAE AAA AG AA AA AAA AC AA AA AA AA AAA AAA AAAAAAAKEARAAAEAARAAEAAAAERAAAACEAAAAEAAKAAEAAAAEAARAAEAARAAE AAA AE AAA AE AAA AEAAAAEAAAAAK 4AAAAAAAEAAAACAAAAGCAAAAGCAAAACAAAAE AA AA AR AA AK AA AA AA AA AAKECAA AA AA AA AA AA AA AA A 4AAAAAAACAAAACAAAAGCAAAAGCAAAAGCAAAACAAAAE AA AAE AA AAG AK AA AA AACAA AA AA AA AK AA
112. ere the upper simultaneous limit USL and not the upper confidence limit UCL for the population mean should be used Comparing individual observations x with the UCL for the population mean u and expecting an adequate coverage for the x s as is sometimes mistakenly done in practice is inappropriate An interval estimate given by 4 above may be used if the coverage for the individual sampled observation x is desired The prediction interval given by 2 is used for a future and or delayed observation xy Robust interval estimates are used in some of the performance evaluation PE studies of the U S EPA e g see Horn et al 1988 For example Horn et al 1988 used the Biweight function Kafadar 1982 to obtain a robust Scout User s Guide 14 28 Chapter 14 Statistical Procedures prediction interval for a future observation x using a noisy sample with outliers obtained from PE studies of the U S EPA Also the robust prediction intervals based on the Biweight influence function are used to assess the performance of the various laboratories participating in the quarterly blind QB PE study of the U S EPA Singh and Nocerino 1995 Singh et al 1993 However interval estimates given above by 3 by definition are more appropriate to provide simultaneous coverage for all of the participants in such QB PE studies Interval Estimates The four interval estimates obtained using the classical and robust Huber and PROP
113. erson Darling test and Kolmogorov Smirnov goodness of fit test graphical normal probability Q Q plot Classical Method Menu This module includes the two classical sequential outlier testing procedures based upon 1 the Max Mds and 2 the multivariate kurtosis This module is given separately here for the convenience of interested users It should be noticed that these procedures suffer from severe masking in the presence of multiple outliers Unmasking of multiple outliers requires the use of a robust procedure with a high breakdown point Some examples using this menu are discussed in Chapter 10 The classical test based on Max Mds with graphical Q Q and index plots is also available in the robust module of the software package Robust Method Menu Scout User s Guide 14 6 Chapter 14 Statistical Procedures The robust module of the Scout software package includes four different procedures to compute all of the relevant statistics including the mean vector the variance covariance or the correlation matrix the Mds the multivariate kurtosis and also to perform the principal component linear and quadratic discriminant analyses Several examples have been discussed in tutorial Section II Chapter 11 The statistical procedures used for this module are discussed in this chapter The four outlier identification procedures in Scout are given as follows a Classical MLE method Wilks 1963 based on Mahalanobis Distances b HUBER influ
114. ervised classification This grouping can be done on the basis of similarities or distance measures obtained from the observed variables or characteristics analytes defects etc Principal component analysis or cluster analysis techniques such as complete linkage single linkage average linkage and Wards minimum distance are used to separate observations into various groups Several clustering techniques should be applied on the same data set If the outcomes of these clustering techniques are roughly consistent with one another then some well Scout User s Guide 14 34 Chapter 14 Statistical Procedures separated groups probably exist This separation process is often performed only once preferably on training sets with known group membership to investigate the differences among the various groups Discriminant functions are then obtained using these separated groups Classification procedures are less exploratory Discriminant functions obtained in the separatory process are used to assign current and new observations into previously defined groups The correct classification of the current observations with known group membership is the basis for the validity of the discriminant functions Scout outputs the confusion error matrix for the linear and quadratic discriminant analyses However outliers can distort the discriminant functions and the corresponding discriminant scores significantly This can result in several misclassif
115. ey a second time will bring out the Transformation Menu and a histogram of that variable as shown in Figure 9 4 Several transformations Film Data Classical Hmthod Rosust Nathnod PCA Gcagnica 5yzztmam EERE Q4 AAAAGCAAAAGCAAAAG AA AA AA AA AA AA amp AA AA AA AAK amp AA AA AA AA AA AA AA aaaaaaa Edit Data eaaaaaaea4446 A4AACAAAACAAAACAAAACAA ARCAR ARECAR AKEAA AAEAA AAA aaaaa 4 Statistica p Nocmal iby Taak QcuuusuAAACAAAACAAAACAAAACAAAACAAAAAR aaaaaan Teanatocn Kolnmogocow Snicoow eannuaneananAeAnAACAAA4AEAAAACAA4AAAA aaaaanaa Peint Data Anadacana Dacling enaaaanasanA4EAARACAR4AAEAAAAEAAAA44A4 AAAAAAAR AAEAARAARERAAAEAAAAEARAAEAAAAEAAAAAAR SAAAAKAAKAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAA AE AAA AAA AAKAAKAAARERARAAEAARAAEARAAEAAAAEAAR AE AARAAEAAARAERAARAAEAARAAEAARAAEAAAAEAAAAE AAA AE AAAAAKR SAAAAAAARERAAKAEAAARAEARAAAEAAAAERARAEAAAAEAAAARERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAR SAARAAARAARERAAAEARAARAEAARAAEAAAAERAARAEAAAAEAAAARERAAAAEAAAAEAAAAEAAAAEARARAAEAAAAEAAAAARK SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAE AAA AE AAA AE AAAAE AKA AEAAAAEAAAAAR AARAAKAAARERAAAEAAAAEAARAAEAAAKAEAARAAEAARAAEAAAAERAAAAEAARAAEAARAAREAAAAEAAA AE AAA AE AAAAAAK SAAAAAAREAKAAKAEAAARAEAAAAEAAAAEARAAAEAAAAEAAAAERAAAAEAAAAEAARAAEAAAAEAAAAEAAAAEAAAAAR SAARAAARAARERAARAEAAARAEAAAAREARAAAERAARAAEAAAAEAAARERAARAEAAAAEARAAREAAAAEARARAAEAAAAEAAAAARK AAAAKAAARERARARAAEARARAAEARAAE AAA AEAARA
116. eys move to the top three data points reveal their identities and your display should now match Figure 11 3 Figures 11 3 11 4 and 11 5 are obtained by using the classical statistics option There the mean and standard deviation sd used to obtain the horizontal lines on these graphs are the classical maximum likelihood estimator MLE estimates Scout Toturial 11 2 Chapter 11 Tutorial Ill aaz Maximum USL is asz Warning USL dz rz T c LU T a W E o Du LL wy c o tt T m gt pa m b E a n a v a x a asx Warning LSL aax Maximum LEL T t t 4 25 4 99 74 Theoretical Quantiles Normal Distribution Figure 31 3 Q Q plot of the sp length variable with the identities of a few data points revealed Press the lt F gt key to save the graph to disk The generated graph will be saved as a PCX file and you can specify its location by including the path along with the file name The graph can also be saved in a postscript EPS format To save the graph in a postscript format press the lt ESC gt key twice to go back to the first screen and move the cursor to Print Destination Press lt ENTER3 gt in the Print Destination window select Encapsulated Post Script and use the ENTER key to finish the selection After you have selected the postscript printer return to Robust Analysis and generate the graph Press lt P gt and supply the graph with a name press ENTER and the
117. f the Mds in an unpredictable manner and often leads to the misidentification of outliers The use of approximate distributions of the Mds such as chi square or normal can also lead to the incorrect ordering of the Mds It is well known Huber 1981 Devlin et al 1981 Hampel et al 1986 Rousseeuw and Leroy 1987 Rousseeuw and van Zomeren 1990 and Barnett and Lewis 1994 that for the identification of multiple outliers one should use robust and resistant procedures with a high breakdown point Most of the robust outlier identification procedures for the identification of outliers and the estimation of population parameters of location and scale are iterative requiring Scout User s Guide 14 4 Chapter 14 Statistical Procedures several passes through the data set This of course will be impossible to achieve without a computer software package Several procedures and influence functions including the Biweight HAMPEL HUBER PROP winsorization univariate and multivariate trimming MVT and MVE based robust procedures exist in the literature The robust procedures based on MVT the HUBER and the PROP influence functions can be used for univariate as well as multivariate data sets These robust procedures along with the classical MLE approach to locate outliers in raw data sets in interval estimations and in principal component and discriminant analyses have been incorporated in Scout These procedures have been tried on numerous
118. g Ma 24 3 444A4A4AAEAAAACAA AREAAAACAA A4AEAAAACAA A4 amp EAAA ACA A A4 amp CA 444 A 44464444 Exit 4AAAAAAACAAAAGAAAAEAAAAEAAAAEAAAACAAA amp KCAAA amp AEAR A KLAR A amp A 4 4 46 4 4 4 4 b 4AAAAAAAEAAAA A Smbug 4sA4ACAAAAEAAAAA saaaaaaaeaaaa l Choose Peintac HP LasacJat 5mcima lll Jececceceeccare ea aaaaaeaaan Paga Dcimababina Landacagm leannaaaenaaaAA sanaaaaaacaaaa Uam Shading Pattacos leceaceeixaceee e 4a44A4eaanAa l Hucizanbal Sealing Paccmat leaaanaaeaannAA sa44a44a eaaaa l Wmscbical Scaling Paccmat 4A44AAAAEAAAAAA sa4au 4aueaanaa l X Zkacking Location PPPPTPTTPTPPPYS ea aaaaaeaaan Y Stacting Location 4AAAAKAAEAAAAAAK e4nad44aazasaa Fucmntmmd Attac Peint leceeeecceesnae eaaaaaaneaaaa l Sqmcity Peimtme Pact LPTI leaaanaaeaanAAnA 4AAAAAAAEAARAAR 4ACAAAACAAAAAAR 4AAAAAAARSAAAACAAAACAAAAS AA AAE AA AAE AA AA AA AAE AA AA AR AA AA AA AR AA AA ARE AA AA AA AAA A 4AAAAAAACAAAACAAAACAAAACAAAAC AA AAG AA AA AA ARE AA AA AR AA AA ARE AA AA AA A AE AA AA C AAA A AA 4AAAAAAACAAAACAAAACAAAAG AR AAE AA AAE AA AA AA ARE AA AA AR AA ARA ARE AA AA AA AA AA AK AA AA AA 4AAAAAAACAAAACAAAACAAAAS AA AAE AA AA AA AA AA AAE AA AA AA AAE AA AAE AA AA AA AA AA AA AA AAA A Oremctocy C SSCDUT 9SSUATA Filanana IRIS OAT Figure 13 7 The Printer Setup menu 13 3 Summary There are three options in the Graphics module of the Scout the modules are displayed in the first window
119. graph will be saved with the EPS extension Simply pressing lt P gt when your on line printer is specified in Print Destination will result in your graph being printed After the graph is saved and or printed use the lt ESC gt key twice to return to the Robust Method menu Move to Select Variables press ENTER and using both the plus and minus keys de select the variable sp length and select the second variable sp width Perform the same set of operations to generate the Q Q plot and your display will match Figure 11 4 Scout Toturial 11 3 Chapter 11 Tutorial Ill aaz Maximum USL asz Warning USL C a P T a n Li o pas nm Li c pe T m gt m v b oO a n a I v T E a asz Warning LSL 2 24 aax Maximum LSL 2 11 t t t 2 62 2 41 21 Theoretical Quantiles Normal Distribution Figure 11 4 Q Q plot of the sp width variable with the identities of a few data points revealed Figures 11 3 and 11 4 can be generated simultaneously by selecting both variables while in the Select Variables option When multiple graphs are generated they can be displayed one after another by using the PAGE DOWN key while the graphic screen is displayed Return to the Select Graph Type menu and select Q Q Plot indiv raw data Press the ENTER key to make the selection move the cursor to the bottom of the window and choose Generate Graph with Current Options
120. gs as follows The example choices used throughout this manual are those displayed by default using the IRIS DAT file which is discussed in the tutorial section Heading Example Choice C mp te Statistics Using maias be ON ma ero ours os EN S uda Ps Classical Weights 212 sho Yes TRIGO EN ERN EE EN REX ERI DEUS Ead D Ed Beta Scout User s Guide 5 3 Chapter 5 Robust Statistical Methods Initial Estimate oxi nk sie Peek otek ES RAR sed EMMA LSE ME Bak sok E ees Classical Right Latl CUtolb o oue ana ee pig dem ates p eed qiu exe edie qeu 0 05 Tramming Percent 9i re Gn Ve ee hie eo ee eg ie e ro Os Ea 0 Each of these headingss has various choices which can be selected by repeated use of the ENTER key when that heading is highlighted After a selection is made the arrow key can be used to move the cursor to the next heading The process can be repeated until the desired choices have been selected The various choices for each of the headings of the Univariate Statistics menu are as follows Heading Choices Compute Statistics Using Classical Huber Influence Proposed Influence Multivariate Trimming Weights Beta Chi Squared Initial Estimate Classical Robust Right Tail Cutoff A number between 0 01 and 0 8 active only when PROP or Huber are chosen Trimming percent An integer between 0 and 100 active only when Multivariate Trimming is used The values for number choices can be typed directly on the screen after using th
121. he she is currently using by pressing the F1 key 8 2 Other options The six options for the System menu are shown in Figure 8 1 below Film Data Classical Hmthad Rosust Natihod PCA Gcagnica Systan 4AAAAAAACAAAACAAAAGEAAAAGCAAAACAAAAE AA AAEAAAAG AA AA AA AA AAA AE AA A nny 4AAAAAAACAAAAEAAAACAA A amp AEAA AR CAR AKCAA AR amp CAAAACAA A amp 6C AA A46 A 4 amp 4 4444 Usxmc a Guida 4AAAAAAACAAAAC AR AR CAR AR EAR ARCAR AK CAR A amp amp A A amp ACA A44 A A4 4464 444 4444 latacnat inn 444A4AAAEAAAAEAA A ALAA A ACA A A ACA AREAA ARCA A A46 AA A A46 A AA Ak EA amp k A46 AA A44 Sound On 44A4AAAAEAAARCAAAACAA AAEA AA ACA AAA EAR AREA A A AE AAA ACA AA AER AK AE A A Pcinkmec Satu 4AAAAAAACAAAAEAAAACAA AAEAA AR CAR AKCAR A amp EAAA amp CAA A amp 6E AA A amp 4 CAA4 amp 4 4444 OOS Shall ERKRRRRRERRRRERRRRERRRRERRRRERRRRERRRRERRAAERRRAEARAAEARA AERA EA A Hinta Dal 4444444 4 AA4 AEAA AACAA AR LAA AA EAAA4EAA 4 ACAA A AEA4 AA AA A46 A4 4464444 Exit LLLA SARAAAAAEAKAAAEAAAAEAAAAEARAAAEAAAAEAAAAEAAARAEAAAAE AAA AE AAAAE AAA AE AAA AE AAA AE AAAAAAK SAAAAAAAREARAAAREARARARERAAAERAAAERAARAEAAARAEAAARAE AAA ARE AARAAE AA AAEAAARK SAAAAAAACAAAAEAAAAGCAAAACAAAACAAAAE AK AA AA AA AA ARE AA AA AA AA AK AA AK AAG AA AA AA AAA 4AAAAAAAEAAAAEAAAACAAAACAAAACAAAAE AAAAE AA AAG AA AAECAA A AC AA AA AA AA AK AA AA AA AA AA AA 4AAAAAAAEAAAACAAAAGAAAAGCAAAAC AA AA AA AA AA AA AA AA AA AA AA AA AA AA AK AA AA AA AA AA AA LL AG
122. hed geostatistical technique frequently used in site characterization studies However OK assumes that there are no spatial trend present and the mean concentration at each location is constant within the region under consideration This assumption is often violated by the data collected from a polluted site Therefore in order to use OK to characterize the site under study data with spatial trend need to detrended so that the constant mean assumption is satisfied Scout offers the D Trend option for removing trend that might be present in a geostatistical data set obtained from a polluted site It assumes that the data is in the same format as for the pattern recognition option with the population IDs in the first column Using an appropriate multivariate technique first the data has to be partitioned into various strata with Scout User s Guide 14 31 Chapter 14 Statistical Procedures significantly different statistics e g mean vectors Using the geographic information of the sample observations a site map can be prepared exhibiting the actual sampling locations and the respective population IDs The D trend option subtracts the respective sub population means from each observation in the corresponding sub population The resulting data satisfy the constant mean assumption Add Means This option is used after OK has been performed using the detrended data and a file with extension grd has been created The means subtracted using the D T
123. iagonal represents the correlation of a variable with itself and therefore always has an r value of 1 00 All other points represent the correlations of the various variables with each other Scout Tutorial 13 1 Chapter 13 Tutorial V Iris data in full Variable Minimum Maximum pt length 1 0 6 9 pt width 0 1 2 5 Variance 3 12 0 58 0 963 0 927 gt r gt 0 75 0 75 gt r gt 0 50 r lt 0 50 View Scatter Plot Help gt Exit Use key pad to select variables Figure 13 2 The variable matrix for two dimensional graphics Focusing on the highlighted point in the matrix use the lt RIGHT gt lt LEFT gt lt UP gt or DOWN arrow keys to select the variable combination for an X Y scatter diagram For the current tutorial use the pt length and pt width combination bottom row second from the right or reflectively fourth row far right After the variable combination is selected as shown in the header information of Figure 13 2 press ENTER to generate the scatter diagram as shown in the Figure 13 3 Iris data in Tull Dar ilal Hinimurn Ham Uar imis pi vidll 0 1 F 1 2 0 52 l lae lle i o 12 e e LEPSE TH J rg M ttt White r l 18 fh 3 La I 2 fireent lily 2 r bz I gt Rw r 0 30 ENTER Select Uarisllux Fi Hulp EIC Exil pt wrih Figure 13 3 The scatter plot for pt length and pt width selected from the variable matrix shown in Figure 13 2 Scout
124. ial Ill 11 4 Statistical Intervals For this section we use the data set 4 METHYL DAT from the Scout Data directory use Read ASCII File in the Files Menu select 4 METHYL DAT press lt ENTER gt From the Robust Analysis menu select Display Graphs For press ENTER select Control Charts Simul Xi press ENTER and return to the Robust Analysis menu Select Statistics Options set the parameters to match those shown in Figure 11 20 move to Accept New Settings press ENTER and return to the Robust Analysis menu Film Data Clazazical Natnod Rosust Nathod PCA Gcagnica Systan 4AAAAAAACAAAAGAAAACAAAAGC AAA A AA A A nn 4 AAAACAAAAEAAAAAA 4AA44AAAEAA4AGA4AA6A4AA4 AA 4464444444 Sulmct Vaciasnima 4AAAAAAEAAAAEAAAAA 4444A4A444 A44A4644A4A46A4A46AA4A4 AA44A44 Uniwaciabm Statistica eaaaanneanAAe44AnA44 44444A444 A amp AA4 A4A4A46AAA46 AA4A44 A444444 Rnuaust Analysis leanaankeannAeanaAua 4AAAAAAAEAAAACAAAAEAAAAEAA 4A AC A44 4AA4 Caontusinan Nateix 4AAAAAAEAAAAEAAAAA 4A4A444AA44AEAA4AGCA4A40AA4A4 AA4 A 4 AAA4A A444 Pattaen Racogaitian leccnctceceeeeecerd 444A4AAAEAAAACAAAACA4AA amp CEAAAA4 AA4A44A44 0 Teand l eaaananeanaacanaas AAA A A ALAS Statiskbical Dgbinaa iiit sa444444 Congubm Statistica Using Peo Int lumacm leccccecee APEEP Initial Katinata Rosuat lecseeeaeen e aanaaa l Nateix Cocecmlation leexceeeea e aanaaa l waignta Guta leaaaaanna 4AARAAAAAR 4AAAAAAARA eR 4
125. ical value are different from those shown in the original outlier test because the dimensionality is reduced by one variable The variable column provides the name of the identified causal variable This is the variable that when present always allows rejection of the discordant observation The Observed column Scout User s Guide 4 3 Chapter 4 Classical Methods for Outlier Identification displays the value in the data set for the discordant observation and causal variable The Expected column gives a prediction of the value by using multiple regression and the values reported for the other variables in that observation Low Lim and Up Lim provide the lower and upper limits respectively for a prediction interval The type I error rate alpha of this interval is the same as was chosen for the outlier test This process is designed to identify cases where apparently the discordancy resulted from substantial deviation in a single variable This can occur when large errors in measurement are independent or when typographical recording and transcription errors cause the outlier For example for the third variable in a ten dimensional data set recording 73 56 as 37 56 or as 735 6 may cause the associated observation to be identified as an discordant If so executing the Causal Variables routine will probably indicate the third variable as the cause of the discordancy 4 5 Associated Causes This feature allows users with sufficient
126. ication results For example in environmental applications it is possible that a distorted discriminant function can classify a reasonably clean sample as coming from the contaminated population and a contaminated sample as coming from the clean population the background Fisher s Robust Method for Discriminating Among k Populations Fisher s robust classification Anderson 1984 Singh and Nocerino 1995 procedure is included in Scout The procedure has been tried on some real environmental and historical data sets Fisher s iris data set has been used in Chapter 11 The population parameter pj and the common covariance matrix E need to be estimated based upon training samples of size n from population B i 1 2 g These estimates can be obtained using an appropriate procedure Scout User s Guide 14 35 Chapter 14 Statistical Procedures classical or the three robust procedures Fisher s method also provides a very convenient and effective way of graphical separation of the p dimensional data in terms of a few discriminant functions s The graphical displays of the first few Fisher s discriminant functions reveal possible groupings and clustering of the g populations It should be pointed out that the derivation of Fisher s discriminants does not require multinormality of the distribution of the underlying g populations Under normality and equal covariance matrices Fisher s discriminant functions reduce to the linear discrimi
127. idth and vertical height dimensions of the graph that is to be printed The actual size of the graph that is printed depends upon this scaling percentage the page orientation and the printer in use The larger the percent scaling the larger will be the printed graph To change your selection highlight the scaling parameter that is to be adjusted and press ENTER in order to edit the scaling value Input the desired value X and Y Starting Locations Use the X Starting Location to set either the height of the bottom of the graph in pixels from the bottom of the page Similarly use the Scout User s Guide 8 2 Chapter 8 System information 8 3 Y Starting Location to set the left margin Highlight the location parameter to be changed and press ENTER to edit the location value Then input desired location Formfeed After Print This feature causes Scout to send a form feed command to the printer after each graph This will cause the printer to output one graph per page You would not select this choice when more than one graph per page is desired Highlight Formfeed After Print and press ENTER to toggle from Yes to No and from No to Yes Specify Printer Port This heading is used to change the printer port for output of graphs Scout defaults to LPT1 but the user may also select LPT2 or LPT3 Highlight Specify Printer Port and press ENTER as needed to change the selection DOS Shell This choice temporarily s
128. ighlight any desired variable To assign the highlighted variable to an axis type the letter of the desired axis X Y or Z When all three axes have been selected press ENTER key to view the graph The user has complete control over the position size scale and rotation of the graph The user can also identify and modify individual points or observations that make up the graph The Scout User s Guide 7 4 Chapter 7 Graphics next few paragraphs will cover all of these controls Should the user forget any of these controls while in the 3D graphics mode pressing the F1 key will bring up a summary of them When the user is finished viewing a graph pressing the lt ENTER gt key will return the user to the variable selection screen Press lt ESC gt to exit 3D graphics mode and return to the main menu 7 7 Moving 3D Graphs The user can move the graph anywhere within its window on the screen Pressing the M key puts the graph into movement mode The arrow keys can now be used to move the axes to the desired location To exit this mode press lt ESC gt ENTER or lt SPACEBAR gt 7 8 Change Size of 3D Graphs The user can change the size of the graph by zooming in and out of the plot The lt gt key zooms into the plot which makes the graph appear larger The lt gt key zooms out of the plot which makes the graph appear smaller Each of these keys can be used as many times as needed Scaling 3D Graphs When the gr
129. in the right tail of the distribution of the Mds labelled as Right Tail Cutoff in Scout is needed in the process Most practitioners are familiar with choosing a significance level value in their applications as all of the statistical tests typically use some level of significance The M estimates obtained using a smaller value of e g 0 001 0 005 usually correspond to the classical estimates whereas larger values of such as 0 2 0 25 help unmask multiple outliers in small data sets of large dimensionality or even unmasking multiple groups of discordant observations e g see the example on the four dimensional stack loss data set of size 21 in Chapter 11 A few values 2 4 of may be tried on the same data set All of the observations within the 1 a 100 confidence ellipsoid after the final iteration can be considered to be inlying forming the main body of the data set Moreover no small sample correction factors are required to provide appropriate coverage and to achieve consistency when samples come from normal populations The PROP procedure described here Singh Singh and Flatman 1994 can also be effectively used to decompose a mixture sample into component populations The multivariate kurtosis statistic Mardia 1970 and Mardia 1974 1s also available in Scout which given by the following equation Scout User s Guide 14 15 Chapter 14 Statistical Procedures b udi 4 where the distances Md are giv
130. ions to be set aside should be used for the multivariate trimming procedure For details see Singh 1993 Two Choices for the Numbering of Points on a Scout Graph The points on a graph generated by Scout can be marked either by the observation number numbers from 1 to n or by the population ID positive integer between 1 and 20 Thus a maximum of 20 populations can be handled by the pattern recognition procedures e g PCA Discriminant and Classification Analysis etc in Scout The default option is numbering by observations Numbering by population is used when multiple populations are present This option is used for pattern recognition techniques such as the PC analysis or discriminant analysis In order to use this option the first column of the data file should have the population ID code e g see the Fulliris data set Ignoring a Population Scout User s Guide 14 10 Chapter 14 Statistical Procedures The user can de select a population the population ID should be in the first column of the data file which will be ignored in all subsequent computations For example if enough observations are not available or if one of the populations is significantly different from the rest of the data the user may wish to ignore those observations for the rest of the statistical analysis However user has the choice to plot or not to plot the observations from the ignored population The default is to plot the data from the ignored population
131. isadana Statistica o J J Plot Simul Ran Ostaj Classical Jecereeeee eeneeaee Zm cnuo Loar L J J Piot Simul Stendecdi zad lececeeeee cataaaaa Limit Styla Scattac Plot Raw Datal e Two Sidad 222222222 aaaaxsa l x Axia vacial J J Plat LPCAJ lenanaanan saanaaaa Y Axim Vacial Scaktme Plot LPCAI lenanaanaa enaanax2a l Titis la J Plat Ganwcalizad O iat Just Analyaia l exanaaanas aanaaa2 l X Axia Titian Canbcnl Chacta Indiw Xil 4AAAAAAA K e aaaaana l Humnamcing Canteol Chacks Simul Xil p322 cat rons lecceeceee ananaadx Contouc Elli Conteol Chacts LUmtmcta jaa iw Sinul enaaanaaa exaaaaaa l Cl Limita Poguletion Naan lececxccee eanaaaa2A Ecasm Dutgue Pemdiction lobzmcvala 4AAAAAAAA sanaaasas vimm Maignba Indas Prata imis urs lecccececa zx 4ax4 l Gmsahomcabm Gea Mulbiwaciabm Kucbnsi3 Jeceteeece qAAAAAAA R 4 AAAAARK Dicmctacy C SSCDUT 9SSDATA Filmahanm IRIS ODAT Figure 11 2 The menu for Select Graph Type resulting from selection ofthe Display Graphs for heading in the Robust Analysis menu Move the cursor to select Generate Graph With Current Options Press ENTER to generate the graph on the graph notice the highlighted data point Press lt SHIFT gt and the identity of this data point will be revealed use the up arrow key press lt SHIFT gt again and the identity of the next point will also be displayed Using the arrow k
132. ive of highly contaminated areas sections of a forest in poor or degraded states inconsistent analytical results in a typical quality assurance and quality control QA QC program or gross typing errors Outliers when present typically distort the classical estimates and the associated statistics which in turn can result in incorrect conclusions based on the statistical inference employed It is therefore important to identify and consequently down weight the outlying observations appropriately Several classical and robust outlier identification procedures are incorporated in the Scout software package A brief description of some of the statistical procedures used in Scout is given in this chapter Sufficient references are included for statistically oriented users Scout User s Guide 14 1 Chapter 14 Statistical Procedures Various state and federal government agencies local communities and industries often need to estimate the extent of contamination at polluted sites The entire cleanup process is expensive and time consuming It is therefore important to obtain these estimates accurately The presence of discordant observations can distort the entire estimation process The use of robust and resistant procedures is essential in the estimation phase e g robust kriging rather than the classical kriging would characterize the polluted site much more accurately Given a sample of size n from a polluted site the sample may represent th
133. kacy C SCOUTISSOATA Filmhamm IRIS OAT Figure 13 6 The System menu with the User s Guide menu also displayed The Information choice provides the Scout version number and information about the computer system on which Scout is loaded The explanation windows can be toggled on or off by using Help Messages The Printer Setup menu can be used to formate print output for specific printers and requirements The menu of various printer parameters is shown in the Figure 13 7 The DOS Shell allows a user to execute DOS commands without leaving Scout And Exit will first ask users if they re sure they want to exit REMEMBER THE CAUTIONS ABOUT DATA TRANSFORMS ALTERING FILES AND SAVING DATA UNDER APPROPRIATE FILE NAMES and if they do return them to DOS Scout Tutorial 13 5 Chapter 13 Tutorial V Film Data Classical Nathod Rosust Nataod PCA Geagnica Syatan 4AAAAAAACAAAAEAAAAEAAAAGCAAAAC AA AAECAAAAG AR AAE AA AAKGAAA AE AA AA A A A A3 4AA4AAAAAEAAAACAR AREAAARKCAR AA amp EAAAACAA A amp EARA AAECAA A amp CAA A46 A 4A44 A444 Uamc a Guida 4AAAAAAACAAAAGCAAAAEAAAAEAAAAEAAA amp CARAKCARA amp AR A4 amp EA A4 amp A46 A 4 44 4444 Iatacnat ina 4A4AAAAAtAA4ACAAA AGAR ARELAA A AEA A A4 EAR A46 AA A ACA A4 A A A448 A A4 464444 Peintme Satug 4AAAAAAAEAAAACAA AREAAAACAR A amp AEAAAAKCAR A amp CARAACAA A amp AA AA A 44464444 OOS Shalt 444A4 4AA6AA4ACAAAAEAA ARLAA 4 AGA AAA EAA A AL A AA ACA 4 AA EAA 4 AE A AA 4C A4 A44 Hai
134. lized Distance and Multivariate Kurtosis both lead to the same menu of three choices cutoff values for of 0 10 0 05 or 0 01 Once an is selected the data are analyzed and the results posted to the screen 10 2 Determining Causal Variables and Removing Flags Working immediately after Multivariate Kurtosis has detected the outlier select the Causal Variable choice to determine the variable s that caused the outlier A variable is identified as a cause if when removed from the analyses the observations are no longer outliers Scout Tutorial 10 2 Chapter 10 Tutorial ll The output is sent to the screen identifying which variables displayed values outside the expected range Scout Tutorial 10 3 Chapter 10 Tutorial Il The Remove Outlier Flags choice is merely a means of unmarking the data that has been identified as outliers Once Generalized Distance or Multivariate Kurtosis has identified outliers these outliers are colored red in the data file The Remove Outlier Flags choice turns the red data back to white the original color of the data After identifying the outliers with Multivariate Kurtosis move the cursor highlighted rectangle to the Data heading and select Edit Data we will NOT be editing the data merely examining it Once the data is on the screen use the up and down arrow keys to examine the data and identify the red outliers Now exit Edit Data return to Classical Method Remove Outlier Flag
135. mat has to be followed As described in Section 2 1 the format requires including information in the file as follows a the data set name or title line 1 b the number of variables line 2 Scout User s Guide 2 2 Chapter 2 Scout File Format c the names of the variables lines 3 through X where X 2 is the number of variables d the values of the variables optionally including the labeling of each data record with a comment in single quotes lines X 1 through the end of the file Example spreadsheet file prepared for conversion to Scout Geostatistical Environmental Data 3 Arsenic Cadmium Lead 850 11 5 18 25 Sample 1 630 8 50 30 25 Sample 2 1 02 7 00 20 00 Sample 3 1 02 10 7 19 25 Sample 4 1 01 11 2 151 5 Sample 5 In this example the data set name should be in spreadsheet cell A1 the number of variables in cell A2 the variable titles in cells A3 through A5 and the values of the variables should be in cells A6 through D10 In the spreadsheet the column D6 to D10 contains the name of each record each of them must be with in single quotation marks In some of the spreadsheet Software such as Excel you may have to enter one or two space bars before the left quotation marks for the data labels the D column in this example Remember both single quotation marks should be visible from the spreadsheet before you save the spreadsheet file in a Space Delimited or TEXT format One or both of these forma
136. mate u and F respectively The median M and 6 are computed by first arranging the data in ascending order Xaj XQj Xa The median M and the absolute deviations from the median xaM i 1 2 n are computed next The median of these deviations MAD is computed Next for data sets from Gaussian populations the statistic 6 MAD 0 6745 is an unbiased estimator of the population sd F The use of M and 6 as the initial start estimators in the iterative process of obtaining robust M estimators of location and scale has been recommended in the literature Devlin et al 1981 These statistics can be obtained using the univariate statistics option of the robust method menu in Scout Outliers in Univariate and Multivariate Data Sets In order to obtain robust estimators of location and scale a chi square x approximation is typically used for the distribution of the distances Md The Md are then compared with an associated chi square reference value Md gt satisfying the probability statement P Mdj lt Md a 1 i 1 2 n This statement represents an approximate confidence ellipsoid for individual distances Md Observations with Mds larger than the reference value are declared as outliers However it has also been suggested that these cutoff points should not be used too mechanically Cook and Hawkins 1990 Fung 1993 Atkinson 1994 The MVE based robust procedures Rousseeuw and Leroy 1987 are
137. metocy C STOUSE Fi lanana TEST UAT Figure 3 1 The Data menu displayed showing four options with Edit Data selected The data set will appear in the form of a spreadsheet You can move about the screen and highlight any data cell A data cell may be a label for a given observation or a value in an observation for a particular variable The keys for moving about the screen are the four ARROW keys PAGE UP PAGE DOWN lt HOME gt and END Observations that appear in red have been flagged as outliers Press lt ESC gt to return to the main menu when finished Editing Observations or Labels Highlight the data cell you wish to edit by moving about the screen with the keys mentioned above then type the correct value or label and press Scout User s Guide 3 1 Chapter 3 Managing Data in Scout ENTER Repeat this procedure for each cell that you wish to modify If you are in the process of changing a cell s value and decide that the original value was correct you can restore the original value by pressing the lt ESC gt key Deleting Observations or Variables Highlight the observation or variable that you wish to delete Any portion of the desired observation or variable you wish to delete can be highlighted Press the lt DELETE gt key You will be given a choice of Observation Variable If you wish to delete an observation i e an entire row of the spreadsheet press the O key or the ENTER key A scree
138. mit lecccceeec esa n44n2 Limit Styia PEPPPICOTA a444444 X Axis Vaciaaim leanaannaa eanaaana l Y Axis Vaciaaim leanaanaaa PEPEE EETA Titia Rosust Analysis leccxceeea PPPE X Axis Titia 4444A4A4AA4 daadaazal Hunameiag Dasacwationa lecccacaaa eaaaaaA442 ContaucEklligam Indiv Simul leaaaanaaa anaaaaana l PTITPEP ENA exukxaaa l Ecasm Dubgutk Film leissaran sanaaaan Claw Waigata Ganacal izad O1atancea IRIS NTS leccccceea sannaaaa Ganacata Gcagn With Cuccmat Dgtiona leccnacead CREE 4 4 amp amp amp 4 4 Dicmcktacy C SCDUT9SSDATA Filmahanm IRIS OAT Figure 5 1 The Robust Analysis menu coming from selection of Robust Analysis in the Robust Method menu 5 3 Univariate Statistics This heading computes univariate statistics The four methods mentioned in the introduction to this chapter are available 1 the classical maximum likelihood estimator MLE 2 the Huber 3 the proposed PROP robust method and 4 sequential trimming The weights can be computed using the exact Beta distribution of generalized distances or the Chi square approximation To perform Univariate statistics use the up and down ARROW key to select Univariate Statistics from the menu and use the lt ENTER gt key At this point a window entitled Univariate Robust Statistics will be displayed This window can be used to set various options for calculating Univariate statistics This window has five main headin
139. n will then appear asking if you are sure that you wish to delete this specific observation The default answer to this question is No If you are sure that you wish to delete the observation type a Y or move the cursor to Yes and press ENTER Repeat this procedure for each observation you wish to delete Similarly if you wish to delete a variable i e an entire column of the spreadsheet press the V key or highlight Variable with an lt ARROWS3 key and press ENTER A screen will then appear asking if you are sure that you wish to delete this specific variable The default answer to this question is No If you are sure that you wish to delete the variable type a Y or move the cursor to Yes with an ARROW key and press ENTER Repeat this procedure for each variable you wish to delete Inserting Observations This heading allows the user to insert observations i e rows to the data set Move about the spreadsheet screen until you find the row in which you wish to insert an observation Press the lt INSERT gt key You will then be given a choice of Observation or Variable Select Observation by highlighting Observation with an ARROW key if necessary and then pressing ENTER or by pressing the O key You will then be given a choice of what you wish the inserted observation to be You may choose it to be the arithmetic mean geometric mean or median of all of the observations for each variable or you
140. nant functions The discriminants are extracted by maximizing the between groups variability relative to the within groups variability E The linear combinations y 1 x i 1 2 S are called Fisher s discriminant functions Scatter plots of the pairs Y yj i j 1 2 S represent valuable graphical displays of between group separation The constant distance ellipses can also be drawn individually for each of the g groups on the scatter plots of the discriminant scores see fulliris data example Chapter 11 These plots provide a formal visual separation among the various groups The Fisher s classification rule is assign an observation x to n h 1 2 8 if Y 1 x minimum D 1 x 4 1 1 2 g 15 Graphical displays of the discriminant functions coupled with the contour ellipses reveal the group separation or overlap very effectively Moreover the scatter plots of the discriminants Scout User s Guide 14 36 Chapter 14 Statistical Procedures versus the original variables can also be used to achieve additional insight for graphically identifying those variables that are the most significant in discriminating among the g populations under consideration Scout User s Guide 14 37 Chapter 14 Statistical Procedures REFERENCES Anderson T W 1984 Introduction to Multivariate Statistical Analysis Second Edition John Wiley New York Atkinson A C 1994 Fast very robust methods fo
141. nd an acceptable value for a and wish to abort this process press the lt ESC gt key 3 4 4 4 Arcsine Transforms the data by using the Arcsine function All of the data must be between zero and one This transform is typically used on data representing proportions 3 4 4 5 Undo Option Undesirable transforms that have been selected can be removed with the Undo Last Transform choice in the menu Transforms must be undone in the reverse order that they were selected This feature gives you great flexibility to try various transforms without the risk of damaging your data Your original data in memory is not modified until you are finished testing and selecting the transforms for all of the variables When you wish to exit the transform module the program will ask you to verify that the variables be modified with the selected transforms 3 4 5 Remarks on Transformation When you have finished selecting the transforms for each of the variables and you are Scout User s Guide 3 9 Chapter 3 Managing Data in Scout ready to exit the transform module Press the lt ESC gt key to do so and answer the question box with the lt Y gt key Another question box will appear asking you if you wish to modify the variables in memory by doing the transforms that have been selected Until now your original data has not been modified you have only been testing the transforms Answer the question with ENTER or the Y key to apply the transforms
142. ndardized observation vectors The user may wish to graph the component scores later using the Graphics menu discussed in Chapter 7 In order to do so these scores need to be saved Users can save component scores using the Transform Data heading Before the component scores can be graphed Scout must be instructed to save the component scores The component scores will replace the original data in the memory CAUTION Scout uses the same computer memory to store the component scores as that used for the original data The Transform Data heading will overwrite the original data with the component scores If a user generates component scores and then saves them to the same file as the original data the original data will be lost Therefore once generated the component scores need to be saved to a different Scout file to avoid loss of the original data However the PC scores classical or robust can be saved in the same data file without overwriting the original data by using the Robust Method menu where extra columns are added to the data file Scout User s Guide 6 2 Chapter 6 PCA The transformed data may consist of component scores and original variables The user must be careful not to misinterpret the resulting data Scout User s Guide 6 3 Chapter 7 Graphics 7 1 General Description Scout features two graphics options 2 dimensional and 3 dimensional 2 Dimensional graphics are used to display bivariate plots also known as scatt
143. nece 444AAAAAEAAAAEAA ARAEAA AREAA A ALAA AA AAA AEA 4A AC A4 A464 4 4 4 qa 4 4 4 4 4 AAAAAKAAEAAAAEAKAAAERAAAAEAAAAEAAAAEAARAAEAKAAAERAAAAEAARAAEAAAAEAARAAEAKAAAEARAAAEAAAAAAR T Cawaciancm Mabcic 1 Pcmas CP to a2cinb ac CESS to sxt SARAAAAAEAAAAEAARAAEAARAAEAARAAERAAAKAEAAAAEAAAAEAARAAE AAA AEARAAAEAAAR AE AAA AE AAA AE AA AAAAK Oiemetocy C SCOUTISSOATA Fi lanana IRIS DAT Figure 12 6 The covariance matrix for the principal components 12 4 Summary IV There are six options in the PCA module in Scout the options are displayed in the first window when PCA is selected from the Scout s main menu The Select Variables option in this module is identical to the Select Variable option in any other module of Scout For each heading in the PCA menu except for Select Variables there are two choices 1 Covariance and 2 Correlation Any output from the PCA module can saved by using the P key and typing the desired path followed by the file name Display Matrices allows users to view the variances and covariances between any set of selected variables The cumulative variance table can be calculated using Eigenvalues and the component loadings can then be viewed using View Components Transform Data replaces the original data with principal components Scout Tutorial 12 6 Chapter 13 Tutorial V Graphics and System 13 1 Graphics The Graphics menu
144. normality is not of concern the Q Q plot can be replaced by a simpler index plot with the sample index number running along the horizontal axis and the Mds plotted along the vertical axis Draw a horizontal line at the o 100 critical value Md of Max Mds which is given by the following simultaneous confidence ellipsoid P Md MdP i 1 2 n 1 q or 7 equivalently using the Bonferroni inequality is given by the statement P Md MdP n a n 8 This horizontal line is labelled as Maximum Largest Md on the Q Q or index plot Finally draw a horizontal line at the 01008 critical value Md a obtained from the distribution n 1 B p 2 n p1 2 n of the individual distances Md satisfying P Md Md a 1 0 i 1 2 n 9 This line is labelled as Warning Individual Md on the Q Q plot or index plot Observations falling above the horizontal line obtained using 8 are potential outliers and observations lying between the two horizontal lines given by 8 and 9 need further examination and points falling below the line given by 9 represent the main stream of data For univariate populations the simultaneous confidence interval can be obtained by substituting p 1 in equation 7 and is given as follows Scout User s Guide 14 19 Chapter 14 Statistical Procedures P x s JMdP x Xx s JMdP i 1 2 m 1 a 10 The estimates used in statements given by equations 7 through 10
145. nt mean assumption can be satisfied before proceeding with ordinary kriging OK 14 3 Options Available For Robust Procedures Two Options For The Initial Start Estimates As recommended in the literature an initial robust start in iterative robust procedures helps in unmasking multiple outliers and also in producing reliable estimates with a higher breakdown point Scout offers two options given below for the initial estimates to be used in the iterative robust procedures HUBER PROP and MVT Classical initial start for estimation of location and scale e g simple mean vector and the covariance matrix Robust initial start with the vector of medians and the covariance matrix with the estimates of standard deviations to be the corresponding MADs 0 675 where MAD represents the median absolute deviation given in the following Scout User s Guide 14 8 Chapter 14 Statistical Procedures Two Options For The Distribution of The Mahalanobis Distances As mentioned earlier most of the robust procedures such as MVT MVE HUBER use the Mds Under normality the Mds are known to follow a scaled beta distribution However due computational ease a chi square or a normal approximation is typically used for the distribution of the individual Mds and their corresponding cut off points which may not lead to correct identification of outliers especially for large dimensional sets of small to moderate sizes Today using the fast personal
146. ocedures 14 1 Introduction to Statistical Procedures for the Identification of Multiple Outliers 14 2 General Description of Statistical Procedures in the Scout Software Package 14 3 Options Available For Robust Procedures 14 4 Robust Procedures in Scout 14 5 Normal Probability Q Q Plots of the Original Data and of Principal Components 14 6 Q Q Plot of Mahalanobis Distances Using Beta Distribution 14 7 Contour Plots 14 8 Robust Principal Component Analysis 14 9 Interval Estimation 14 10 D Trend and Add Means 14 11 Outliers in Discriminant and Classification Analysis REFERENCES 11 1 11 5 11 9 11 18 11 22 11 24 11 26 11 27 12 1 12 2 12 4 12 5 13 1 13 4 13 6 14 1 14 6 14 8 14 12 14 17 14 18 14 20 14 21 14 29 14 32 14 35 14 39 Chapter 1 Preliminaries 1 1 Introduction Scout is a univariate and multivariate data analysis tool Several classical and robust procedures such as outlier testing and interactive 2D 3D graphics are included in Scout making it a useful package for environmental and ecological applications Straightforward principal component classification and discriminant analyses are included to increase the versatility of the software package Scout may be used to 1 transform data 2 assess the normality of variables in the data set 3 produce histograms and Q Q plots of raw data and principal component PC scores 4 produce scatter plots of raw data of PCs and of discriminant s
147. ociated with the 3 Dimensional graphics consult the user s guide for further instruction or simply work with the software remembering to use lt F1 gt for help when needed Scout Tutorial 13 3 Chapter 13 Tutorial V Iris data in set Variables Help Exit Variables Search Mode count sp length sp uidth Figure 13 5 One of many possible perspectives of the three dimensional graph from Figure 13 4 13 2 System The System menu has six options as shown in the Figure 13 6 The User s Guide heading leads to a menu of various topics similar to those covered in this document To access information on any aspect of Scout move the cursor to highlight the appropriate section of the User s Guide and press ENTER The menu of various sections is also shown in Figure 13 6 Scout Tutorial 13 4 Chapter 13 Tutorial V Film Data Classical Hamthod Rosust Natihod PCA Gcagnica Syatan 4AAAAAAAGAAAAGAAAAC AA AAE AA AA AA AA AA AAGCAAA AG AA AA AA A A AA A A A A LL M 4AAAAAAACAAAACAAAACAAA AGAR ARCAAAACAAAACAAA4 amp EA amp amp A46A4A4 AA4A44 4444 Usmc sa Guida 4AAAAAAACAAAACAAAACAAA AAA A amp EAAAKCAAAACAA A44EAA amp 4 amp KE AA AAC AA A446 4444 Ilatacnat inn RAKARAAAEAAR AEAAAAEARAdEAARACARAAEAAA AEA4d ARCA A4 AE AA AREA AR424444 Peintme 5mtug 4AAAAAAAEAAAREAAAAEAAAACAA A amp EAAAREAA AREA A A amp CA A4A4 amp EA A amp CA 44464444 OOS Shall 4AA4AAAALAAAACAR AAEA AA ACAA AAEAR AR EAA A AEAR AA CA AA AE A 4 AG A 4 4464444
148. or is yellow and the default shape is an x To select a new color and shape press the F2 key The current color will now be Scout User s Guide 7 1 Chapter 7 Graphics highlighted Use the UP or DOWN arrow keys to highlight the desired color and then press ENTER or the lt RIGHT ARROW key Now the current shape will be highlighted Again use the UP or DOWN arrow keys to highlight the desired shape and press ENTER to complete the selection To change the graph symbol color and shape of an observation first use the F2 key to change the color and shape then use the UP or DOWN arrow keys to highlight the observation that is to be changed and then press the ENTER key The graph symbol corresponding to the highlighted observation then changes to the selected graph symbol shown in the right window The highlighter is then moved automatically to the next observation This makes it very easy to change a continuous block of observations by holding down the ENTER key The user can exit this screen at any time by pressing lt ESC gt key All of the changes made are retained in memory Sometime before exiting the program the user should save the data in memory as a Scout file so the changes become permanent otherwise they will be lost 7 3 Command Summary for 2D and 3D Graphics Scout recognizes the following field commands when either 2 or 3 dimensional plots are displayed lt F gt Outputs
149. or the 4 METHYL DAT data Scout Toturial 11 20 Chapter 11 Tutorial Ill Using the same data set construct the prediction interval for future observations Select Display Graphs For press ENTER choose Prediction Intervals press ENTER and then model the rest of the Robust Analysis menu to match Figure 11 23 To generate the graph choose the Generate Graph With Current Options from the Robust Analysis menu and press lt ENTER gt The first output will display statistics and the prediction interval see Figure 11 24 Press lt Q gt to reveal the graph Figure 11 25 Film Data Classical MHmthod Rosust Nataod PCA Gcagnica Syatan 4AAAAAAACAAAACAAAAC AAA AC AA A AG A A AA AA a ARRERA RRR RRRA R 4A444A4444644446AA444 A4 A46 AA4A46 A 444444 Smimct Vaciasima leaaaanae ka 44en44A4 444A4AAAEAAAA CAAAA AAA4 AA4A4A amp 44A4A4444 Uniwectatm Statistica PEPEESEAEPSAY TSATPUS 444A44AA4ACAAA4 AAA4A4EAAAA4 amp AAA4 AE A A4A4A4444 Rosust Analysis 4AAAAAACAAAACAAKAAK 4444444A4 AA4AA4 AA4A44GAAAA4 A 444 4 44A4444 Contusion Nateix leaanannenAanAenanAA 4A4ARAAALAAAACAA4A A4AA EAAAA4CAA4 A444 Pattacn Racogartian leccacecececccecced 4444AAAACAAA amp CAAAACAAAA AA4A4464444444 0 Teand PEPE EETEPE ST ESET TS 4A AAA AA A MM Ruaust Analysis ei eR A KA aaaanA42 Dia2lay Geagna Fac Pemdickion Intacwala PEYPPPPPPT anaaanaa l Statistica Options Peon Int lumacm lexaeecerae aa aa
150. phs For in the Robust Analysis menu Changing our file back to IRIS DAT selecting Scatter Plot PCA as described above and revising Numbering back to Observations we select Generate Graph With Current Options and press lt ENTER gt We now exercise two graphic options 1 press lt N gt and the identities data labels of the data points are displayed and 2 press lt E gt and the contour ellipse is drawn around the data both the individual and simultaneous ellipses if this option was not changed since out last graph With the exception of the title your display should now match Figure 11 10 The title can be supplied by highlighting Title in the Robust Analysis menu pressing lt ENTER gt typing in your title pressing ENTER again and the generating the graph Scout Toturial 11 9 Chapter 11 Tutorial Ill Scatter Plat of First Tuo PCs 4 c v c D a 9 a m a Ez u c E b a 4 t 1 amp 7 n a Principal Component 2 Figure 11 10 The scatter plot of principal components 1 and 2 for the Setosa data To draw the PCA scatter plots for data sets with multiple populations Pattern Recognition is recommended Change the data file to FULLIRIS DAT and move from Robust Analysis to Pattern Recognition in the Robust Method menu Press lt ENTER gt view the menu and change any choices for the various headings to those shown in Figure 11 11 Scout Toturial
151. qumMM uamc a Gugr m eM ER AcAAAAKA cassada Allona tha uamc to wian tha antica Scout manual nanu zzz 44444444 nf majoc topica 13 geowidad a0 tha ussar can quickly tind Jeccecceee esasta iatocmation saout any Logic 4AAAAAAA A CHAKRA 4A AAAAAA SAAAAAAACAAAACAAAAGAAAAGCAAAACAAAAE AA AA AK AA AA AA AA AA AA AA AA AA AK AA AA AAG AA AA AA 4AAAAAAAEAAAACAAAAGCAAAACAAAACARAAE AR AA AA AA AA AAEAA AA AA AA AA AA AK AA AA AA AA AA AA SAAAAAAAEAAAACAAAAGCAAAAGCAAAACAAAAE AA AAG AA AA AA ARE AA AA AA AA AA AA AAA AG AA AA AA AAA A 4AAAAAAACAAAACAAAAGAAAAGCAAAACAAAACAAAAG AR AA AA AA AA AAC AA AA AK AA AK AA AA AA AA AAA 4AAAAAAAEAAAAEAAAACAAAACAAAACAAAAE AAA AK AA AAG AA AAECAA AA AA AA AA AA AK AA AA AA AA AA AA Oiemetocy C SCDUTSDATA Filmoana IRIS OAT Figure 8 1 The six options ofthe System menu Information This choice displays the Scout version and hardware configuration including the processor coprocessor graphics adapter and the amount of RAM found and used on the system Help Messages The user can disable or enable the help windows that correspond to the menu items Unless the user is very familiar with Scout disabling the help windows is not recommended Printer Setup The printer in use must be specified in order for Scout to print graphs This heading allows the user to select the make and model of printer for graphs The user can also set Scout User s Guide 8 1 Chapter 8 System information printer specific
152. r the detection of multiple outliers Journal of American Statistical Association 89 1329 1339 Barnett V and Lewis T 1994 Outliers in Statistical Data third Ed John Wiley UK Campbell N A 1980 Robust procedures in multivariate analysis I robust covariance estimation Applied Statistics 29 3 231 237 Cook R D and Hawkins D M 1990 Comment on Unmasking multivariate outliers and leverage points by P J Rousseeuw and B C van Zomeren Journal of American Statistical Association 85 640 644 Daniel C and Wood F S 1980 Fitting Equations to Data John Wiley New York Devlin S J Gnanadesikan R and Kettenring J R 1981 Robust estimation of dispersion matrices and principal component Journal of American Statistical Association 76 354 362 Scout User s Guide 14 38 Chapter 14 Statistical Procedures Dixon W J 1953 Processing data for outliers Biometrics 9 74 89 Fung W 1993 Unmasking outliers and leverage points A confirmation Journal of American Statistical Association 88 515 519 Hahn G J and Meeker W Q 1991 Statistical Intervals New York John Wiley Hampel F R 1974 The influence curve and its role in robust estimation Journal of American Statistical Association 69 383 393 Hampel F R Ronchetti E M Rousseeuw P J and Stahel W J 1986 Robust Statistics the Approaches Based on Influence Functions New York John Wiley Horn P
153. rend option need to be added back to the kriging estimates in the grd file This can be achieved using the Add Means option This option uses two input files a statistics file with extension sts Example sts and a file with extension add Example add The sts file should follow the same format as the statistics file generated by Scout A separate add file e g pb add is required for each variable considered The add file has the following format a bc X X y y population Id1 X X y y population Id2 Repeat for each region of the site Here a Total number of sub populations b Total number of variables c Number of the variable in the sts file Scout User s Guide 14 32 Chapter 14 Statistical Procedures X X y y are the coordinates of the boundary of a geographic region a rectangle belonging to one of the sub populations Thus the region bounded by x Y x5 YD Xi Y2 and x3 Y2 belongs to the population with the corresponding ID Example The example add file for lead Pb is Pb add There are two populations a 2 and 4 variables in the data file with b 4 Lead in the second variable in the sts file therefore c 2 242 0 200 0 3500 1 200 3000 0 1220 1 1100 3000 1220 1700 1 1850 3000 1700 3500 1 200 1850 2780 3500 1 200 1100 1220 2780 2 1100 1850 1700 2780 2 So using this input file when the add means option is activated the mean of sub population 1 will be added to all observations within
154. s and press lt ENTER gt Return to Edit Data re examine the data and note that the previously identified outliers are now white 10 3 Summary Outlier detection on any data set can be accomplished by using one of the two options in the Classical Method menu of Scout Each of the two outlier detection headings has three predetermined choices for however using the Robust Method any between 0 001 and 0 8 can be selected in the Generalized Distance test In addition to outlier detection Scout can be used to identify the variable that caused the outlier The outlier flags can be removed by using the Remove Outlier Flags option Scout Tutorial 10 4 Chapter 11 Tutorial Ill Robust Method The following tutorial is on robust analysis Classical and Robust techniques will be applied on some well known data sets such as IRIS DAT Fisher s Anderson 1984 iris data on the Setosa species of iris FULLIRIS DAT data on two other species of iris in addition to the Setosa 4 METHYL DAT data on the recovery of 4 methyl phenol from 1993 performance evaluation samples and STACKLSS DAT Brownlee s Stack Loss data set Daniel and Wood 1980 These data files can be found using the C Scout Data DAT path 11 1 Q Q Plots Select the file IRIS DAT using Read ASCII File as described in tutorial I Use Select Variables from the Robust Method menu choose only one variable e g sp length by using the
155. s information can be printed either to a specified file or directly to the printer by pressing the P key 3 4 5 Histogram Window Histograms may be displayed by pressing the H key This key functions as a toggle that is the histogram window will be active until the H key is pressed again As you scroll through the variables in the statistics window you will notice that the histogram is being updated to correspond to the current highlighted variable The two numbers near the bottom of the histogram window are the minimum and maximum values for the current variable The scale for the histogram adjusts automatically as variables and transforms are selected 3 4 4 Transformation Menu There are five transforms you may use First you must highlight the variable to be transformed and then press the ENTER key to bring up the transformation menu The menu contains five transform functions and an undo option Each of these will be explained separately in the following paragraphs 3 4 4 1 Linear This transform allows you to change the location and scale of a variable The program will prompt you to enter two constants a and b to be used as follows X X a b where b cannot be equal to zero Once you have entered the constants the transform will be applied to a copy of the data The histogram and statistics windows will be updated according to the results of the transform A new window in the center of the screen displays
156. s t or a normal distribution is typically used to obtain the critical values used in 3 which can result in significantly different interval estimates d 1 a 100 prediction interval for a future observation Xo Hr ty 2S V 1 wsumZ 1 lt x lt x t a2 8 V 17 wsumz 1 1 a 14 A real data set from a QB study of the EPA is considered to demonstrate the differences among these intervals in Chapter 11 The user can generate the graphs of these intervals by pressing the Q q key which can be printed on a laserjet printer by pressing the p key In summary the Scout User s Guide 14 30 Chapter 14 Statistical Procedures procedure presented here 1 identifies multiple outliers effectively 2 uses appropriate test statistics 3 computes the adjusted degrees of freedom d f associated with the test statistics by assigning reduced weights to the outlying observations and 4 provides more precise and accurate estimates of the underlying population parameters and the associated intervals 14 10 D Trend and Add Means These two options D Trend and Add means are useful to perform geostatistical analysis Some knowledge of geostatistical analysis such as kriging and variogram modelling is required Users not interested in this may prefer to skip this Section These options require knowledge of the geographic location e g Easting Northing coordinates for each of the sample observations Ordinary kriging OK is a well establis
157. scale estimates obtained using the remaining 74 inlying observations Both graphs are very similar confirming the existence of the above mentioned 8 outliers This can be easily performed by creating an extra first column representing the population IDs with the 74 inlying observations as coming from population 1 say and the 8 outliers identified as coming from population 2 The extra column variable can be inserted using the Edit Data option of Scout The user then can use the Ignore Population 2 option with Plot Ignored Population Yes setting to produce graphs 5 and 6 The PROP estimates and also the Mds which are not included here with or without the outliers are in close agreement with the MLEs without the outliers The minor differences between the robust and classical results without the 8 outliers are due to the fact that border line observations 66 and 82 are assigned reduced weights in the PROP procedure The associated statistics are summarized as follows Robust Statistics All Observations Covariance Matrix Mean vector xl x2 x3 x4 Octn xl 4435 0 82 7 27 0 24 3 95 62 650 x2 0 82 1 24 0 91 0 06 0 25 1 298 x3 1 21 0 91 12 89 0 35 0 63 56 820 x4 0 24 0 06 0 35 0 03 0 06 1 591 Octn 3 95 0 25 0 63 0 06 0 79 91 569 Scout User s Guide 14 26 Chapter 14 Statistical Procedures Classical Statistics After Deletion of 8 outliers Covariance Matrix Mean Vector xl x2 x3 x4 Octn xl 4424 0 78 7 37 0 17 4 02 62
158. se must be specified before Scout can print any graphs See System Printer Specifications to select the make and model of the printer and other graphics specifications Scout can only print graphs that are displayed on the monitor Press the lt P gt key to print the graph that is on the screen A line will move across the screen as Scout Reads the graph and sends it to the printer 7 4 2 Dimensional Graphs The second heading in the graphics pull down menu 2 Dimensional is the 2 dimensional graphics system If any observations have been flagged as outliers Scout will ask the user if those outliers are to be used in statistical calculations Scout will then place the computer in graphics mode and display a color coded correlation matrix of the data Each point in this matrix represents the correlation of two variables The names of these two variables are printed near the top of the screen along with some summary statistics on each of the two variables The correlation values are printed on the right side of the screen The color coding scheme works as follows White indicates a correlation coefficient greater than 0 75 Green indicates a correlation coefficient greater than 0 5 and less than 0 75 All other correlation coefficients less than 0 5 are red The upper left point of this matrix will be highlighted with a purple box The user can move through the matrix with the arrow keys and quickly get an idea of how any two variables are rel
159. sing the Add Means heading This option uses two input files a statistics file with extension sts Example sts and a file with extension add Example add The sts file should follow the same format as the statistics file generated by Scout A separate add file e g pb add is required for each variable Scout User s Guide 5 12 Chapter 5 Robust Statistical Methods considered The add file has the following format a b c X X y y population Id1 X X y y population Id2 Repeat for each region of the site Here a Total number of sub populations b Total number of variables c Number of the variable in the sts file X X y y are the coordinates of the boundary of a geographic region a rectangle belonging to one of the sub populations Thus the region bounded by x yj X Y1 Xi Y2 and x y belongs to the population with the corresponding ID Example The example add file for lead Pb is Pb add There are two populations a 2 and 4 variables in the data file with b 4 Lead in the second variable in the sts file therefore c 2 2 4 2 0 200 0 3500 200 3000 0 1220 1100 3000 1220 1700 1850 3000 1700 3500 200 1850 2780 3500 200 1100 1220 2780 1100 1850 1700 2780 NO Nee FR ee So using this input file when the Add Means heading is activated the mean of sub population 1 will be added to all observations with in the region bounded by 1100 1220 1100 1700 3000 1220 and 3000 1700 This will
160. ta Scout User s Guide 4 4 Chapter 5 Robust Statistical Methods 5 1 Introduction to Robust Statistical Methods Outliers are inevitable in most applied and scientific disciplines In a manufacturing process outliers anomalies extremes maverick observations typically represent some mechanical disorder of the system unexpected experimental conditions and results raw material of an inferior quality or misrecorded values In biological dose response applications outlying observations may indicate an entirely different type of reaction an unusual response to a newly developed drug In this case outliers may be more informative than the rest of the data In environmental and ecological applications outliers could be indicative of highly contaminated areas sections of a forest in poor or degraded states inconsistent analytical results in a typical quality assurance and quality control QA QC program or gross typing errors Experimentalists especially environmental scientists generate and analyze large amounts of data Most of these practitioners therefore are familiar with the situations when some of their experimental results look suspicious or significantly different from the rest of the data In data sets of large dimensionality it becomes tedious to identify these anomalies Appropriate multivariate procedures need be used to identify multivariate anomalies Several univariate and multivariate procedures are incorporated in the
161. test statistic Max Mds can be easily incorporated in a software package A sequential outlier detection procedure based on the test statistic Max Mds and multivariate kurtosis have been included in the classical method menu in Scout The robust module of Scout computes these critical values and uses them on the Q Q and index plots of the generalized distances Mds to formally define and identify outliers Most outlier identification statistics including the Max Mds multivariate kurtosis and the minimum volume ellipsoid MVE are functions of the Mds which depend upon the estimates of population location and scale The presence of outliers usually results in distorted and unreliable maximum likelihood estimates MLEs and ordinary least squares OLS estimates of the population parameters The classical MLEs of mean and variance have a zero breakdown point The breakdown point of an estimator is the smallest possible fraction of observations that have to be replaced to distort the estimator without any bounds Hampel 1974 Zero breakdown point of an estimator means that the presence of even a single outlier can completely distort the statistic under consideration Thus all other related statistics including interval estimates principal components PCs and the estimates of regression parameters get distorted by outliers This means that the test statistics and inference based on these classical estimates may be misleading For example
162. the ENTER key when the cursor is on the Graph title option When satisfied with all heading choices use the down lt ARROWS3 key to move the cursor to the last selection Begin computations with selected options Use the ENTER key to generate the data pattern Scout User s Guide 5 11 Chapter 5 Robust Statistical Methods The first computation in this module will be the Eigenvalues and Eigenvectors use the lt ESC gt key once to generate the Confusion error Matrix Use the lt ESC gt key once more to generate the scatterplots of Discriminant Scores Various discriminant scores will be plotted when the PAGE UP or PAGE DOWN key is used Use the E key to generate the ellipse corresponding to the various score clusters If the Populations choice is used for the numbering heading graphs generated will use different colors for different populations 5 7 D Trend The following two headings D Trend and Add means are useful to perform geostatistical analysis Some knowledge of geostatistical analysis such as kriging and variogram modelling is required Users not interested in this may like to skip this Section These headings require the knowledge of the geographic location e g Easting Northing coordinates for each of the sample observations Ordinary kriging OK is a well established geostatistical technique frequently used in site characterization studies However OK assumes that there are no spatial trend present and
163. the first few high variance principal components PCs represent most of the variation in the data the last few low variance PCs provide useful information about the noise that might be present in the experimental results Graphical displays of the first few PCs are routinely used as unsupervised pattern recognition and classification techniques The various contour ellipses can be drawn on the scatter plots of the PCs The elliptical scatter of these PCs suggest normality of the data set The normal probability Q Q plots and the scatter plots of PCs are also used for the detection of multivariate outliers However since the MLE of the dispersion matrix gets distorted by outliers the resulting classical PCs may also be misleading The robust PCs give more precise estimates of the variation and noise in the data by assigning reduced weights to the outlying observations Outliers and Principal Component Analysis Let P Pis Por P represent the matrix of eigenvectors corresponding to the A eigenvalues given by A seer Ay gt of the sample dispersion correlation matrix E 14 752 classical or robust The eigenvector p corresponds to the largest eigenvalue A and the vector p corresponds to the smallest eigenvalue Ap of E The equation y Px represents Scout User s Guide 14 21 Chapter 14 Statistical Procedures the p principal components with y p x representing the i PC The normal Q Q plots for the PCs can be o
164. the scatter plot to a PCX file lt H gt Hides i e does not display observations that were identified as outliers toggle N Replaces the symbol for each observation with the observation number toggle P Prints the scatter plot on a printer Outputting a graph to a PCX file Both 2 dimensional and 3 dimensional graphics screens may be written to a file on disk When the user has the desired graphics image displayed pressing the F key will prompt the user for a file name Type in a file name including the drive and directory but without an extension as PCX will always be used and press ENTER key The graphics screen will be written to the file in PCX format which many other software packages can read Hiding Outliers in Scatterplots If you wish to view a scatterplot in which the outlier observations are not displayed press the H key Press the H key again and the outliers will be displayed as before CAUTION Hiding outliers from a scatter plot does not change the Scout User s Guide 7 2 Chapter 7 Graphics statistical properties of the variables Replacing Symbols with Observation Numbers Sometimes it is useful to see where individual observations or groups of observations are located on a scatter plot Press the lt N gt key and the symbols for the observations of the scatterplot will be replaced by the observation numbers Press lt N gt key again to return to symbols Printing a graph The printer in u
165. the user to change the name units format and any comments about the variables in the data set Press the lt ALT gt and the lt V gt keys together A small screen will appear showing the name units format and comment for the first variable in the data set Find the variable that you wish to edit by using the ARROW keys or by using the PAGE DOWN key Pressing the F1 key at this point will reveal a screen that shows field edit commands that make editing easier e g delete to end of line Type in the changes you wish to make Press lt ESC gt to exit Editing the Title of the Data Set To change the title press the lt ALT gt and T keys together Type in the title of the data set Press ENTER to exit Scout User s Guide 3 5 Chapter 3 Managing Data in Scout 3 3 Summary Statistics Scout will display summary statistics such as mean standard deviation and variance for each variable when Statistics is chosen from the pull down menu The Num field displays the number of valid observations that were used in the calculations for each of the variables The Miss field displays the number of missing observations for each of the variables The statistics can be printed by pressing P while the information is still on the screen 3 4 Data Transformation The transform module in Scout allows each of the variables in memory to be tested for normality using the Kolmogorov Smirnov and Anderson Darling tests If th
166. to accomplish this Press the lt Z gt key to see a graph of the X variable versus the Y variable What Scout has really done is just rotated the graph so that the Z axis is pointing straight out of the screen Similarly press the Y key to view the X variable versus the Z variable and the X key to view Z versus Y 7 11 Response Surfaces The Scout has the ability to display three dimensional surface plots The raw data must be in a regular grid format The data set must be defined over a complete set of evenly spaced values in the X and Y variables If a data set is not on a regular grid then the user may wish to modify the data set using other software so that a regular grid is achieved The number of points on the grid must be less than 1000 which is approximately a 30x30 grid To generate a surface plot from a regular grid data set select the X and Y axes so that these define the grid and select the Z axis as the response variable Press ENTER to display the three dimensional scatter plot then press the R key to draw the response surface The R key functions as a toggle between the scatter plot and the response surface Scout User s Guide 7 6 Chapter 8 System information 8 1 User s Guide This option enables the user to view the entire Scout Manual A menu of major headings is provided so that the user can quickly find information about any topic in Scout The user can access the User s Guide for the heading that
167. to your original data If for some reason you wanted to abort this transform process and retain your original data you would answer the question with the N key You should now be back in Scout s main menu If you have modified the variables in memory you may wish to save them to a new file on disk before you go on with your analysis CAUTION Once you exit the transform module your transform history is not retained It is advised that you log all changes for future reference If you start the transform module again it is a new session and all transform lists are blank 3 5 Print Data This heading is used to print the data set currently in memory Scout will ask the user if the output is to be condensed If the user answers no then Scout will format the output with up to six variables across each page The printer should be set to 80 columns If the user answers yes to condensed printing then Scout will format the output with up to ten variables across each page The printer should be set to 132 columns for this to work correctly Scout User s Guide 3 10 Chapter 4 Classical Methods for Outlier Identification 4 1 Introduction to the Classical Methods for Outlier Identification This chapter discusses the various procedures available within the Classical Method menu These procedures are used for outlier identification Once a data file has been converted into Scout format Scout may be used to test for discordant observations in the
168. ts are built in features of most popular spreadsheet software The following spreadsheet software has been tested for the ability to produce a useable Scout file Software Result File Format QuattroPro 6 0 for Windows Works Text file Excel 4 0a for Windows Works Any of 3 text file formats QuattroPro 1 0 for Windows Doesn t Work No text or space delimited format available Scout User s Guide 2 3 Chapter 2 Scout File Format If the file is saved as a Space Delimited print file use the extension prn If the spreadsheet software does not have built in Space Delimited format then save the file with the extension prn along with the following options 1 NO MARGIN 2 PAGE LENGTH ONE 3 UNFORMATTED After the file is saved from any spreadsheet exit the spreadsheet Software and copy the file into the Scout directory with extension dat This newly created file in the Scout directory can be used as a Scout file 2 3 Load Scout File Upon start up of Scout the user is placed in the File heading of the main menu The first thing the user should do is select either Load Scout Data File or Read ASCII Data File from this pull down menu Both headings display a menu of possible data files from the current directory and any subdirectories in the current directory The user can change the current directory by highlighting the desired subdirectory and pressing the ENTER key All subdirectories are identified by placing the V sym
169. tterplots and contour ellipses for classical robust PCA can also be produced using the Robust Method menu as discussed in Chapter 5 Using PCA the user can look at the correlation covariance matrix directly on the screen The PCA menu has five headings as displayed in Figure 6 1 Film Data Classical Nathod Rosuat Nataod PCA Gcagnica Syatan 4AAAAAAACAAAACAAAAGAAAAGAAAAG AAA AG AAA AG AA A AG A A A A A A A A gH LEEK A 4444A44ACAAAAEAAAACAAAACAA4A CAAAA46AA4A446A4A44 CAA4 4464444 5mimckt Vaciaaima lsannaas SESXAARAGAEARAEAKRKAEXAAAENXAEAEAAKARRAXWEAAAXER A aX amp du Uis2 ay MHatcicma PERPE 4AAAAAAACAAAACAAAACAAAAEAAAAEAAAAE AA AACAAAAEAAAACAA4AA4 Fimo valuma 4AAAAARA 4AAAAAAAEARAREAR ARCAAARAEAAAACAAA ACA AR ACA AA amp CA AKA amp Au 44 vian Congonaots PFTTPTPS 4AAAAAAACAAAAEAAAACAAAACAA A amp EAAAACAAA4A A4 4 amp AA4A44 4444 Teanatocm Data leaaaaaa 4AAAAAAACAAAAEAAAACARAKCAAAA amp EAAA AAA AACAA A amp amp AA A46 4 R 4 R MM a aa 4444 SAARAAAAAEARAAAERAAAAEARARAAEAARAAEAAARAEARAAAERARAAAERAAAAEAKAARAERARAAAEAAAAERAAAAEAAAAEAAAAAR 4AAAAAAACAAAAEAAAACAAA AK AAKAAGAAAAGAAAAG AA AAGAAAAKGCAAAACAAAACAAAACAAAACAAAACAAAAAA 4AAAAAAAEAAAAEAAAACAAAACAAAAG AA AA AA AAGAAAAGAAAACAAAAGAAAACAAAAECAA AA AA AA AA AA AA 4AAAAAAAEAAAACAAAACAAAACAAAAE AA AAGAAAAGC AA AAGCAAAAGAAAAGCAAAAGCAAAACAAAACAAAACAAAAAA 4AAAAAAACAAAACAAAACAAAACAAAAGAAAAGAAAAG AA AA AA AAGAAAAGAAAACAAAACAAAAC AA AAEAAAAAA SAARARAARACEAARAAEARAAEAAAAERAARAAEARARAAE
170. umerical options are called for highlight the appropriate field and type in the correct value When satified move to the bottom of this window select Accept New Settings and press ENTER Scout Toturial 11 5 Chapter 11 Tutorial Ill Film Data Classical Nathod Rosust Nathod PCA Gcagnica Syatan 4AAAAAAACAAAACAAAAEAAAAG AA A AG A A A A X A A mmm LE LL LAER KAKO REALE 4A44AA44A A4AA AAA4 AAAA46A4A44 AA4 44444 Sulack Vaciaaima leceeeceeeecceteree QARRRRRRERRRRERRRRERRRRERRRRERRRRARAA Uniwaciakm Statistica leccreeceeeencccene 4A4AAAAACAAAALAAAACAAA4CAA 4 ACA 44A44 l Rosust Analysis 4AAAAAAEAAAACAAAAA 44444A4ACAAAALAAAACAA A4 AAA4CAA44AA44 Contusion Mabcix 4AAAAAAEAAAAEAAAAA 4A444A444EA4AALAAA4 A4 A46 AAA 464444444 Pattaen Racogaitian Jecectecetecceeaaee 4A4AAAAACAAA amp CAAAA4 CAAAACAAA4 4444444 0 Tcmad PEPE E ESTES TTTT TELS SL A qM Statistical Dubinna mm ii eK 14444444 TOMMubs Statistica Using Classical lectctecaas aaa aAa Initial Exbimabm leaaannaaaa Mabeceix 4AAAAAAAAR mirata leccntcace aaaaaaa saaaaaaaa a 4a444 xX Y Cuncdinabma Scalm Factor Lal 4AAAAAAAAK AAAAAAA K AAAAAAAR ananaana l Right Tail Cutott leceeeecce eaaanaxna l Tuning Canatant eensaanasaa saanaana Comteol Chact Limits asas An aaaaaaa l Teinning Paccant lecatecere PPPE Igancm Population F lecctecece aaaaaaa Plot igancmd Population 444444444 4AAAAAAAR 4AAAAAAARA aaa aAa Ac
171. uspends Scout and runs a secondary copy of COMMAND COM The user may then execute DOS commands or type EXIT to return to Scout Exiting Scout The user can exit Scout and return to DOS by selecting Yes with this option WARNING Make sure that all of the desired graphs data and changes to files have been saved before selecting this option Unlike some software packages Scout does not prompt the user on whether the current file is to be saved Scout will not automatically save data sets graphs or changes made to a file with this option See the appropriate sections of this User s Guide for instructions on saving graphics and data in Scout Scout User s Guide 8 3 Chapter 9 Tutorial Scout Basics 9 1 Nomenclature Classical Nathnod Rosust Haxthod PCA Gcagnica Systan 4AACAAAACAAAACAAAACAAAACAA AA AA AA AR AAC AA AAEAA AA AR AA AA AAA Raad ASCII Film eaaaanAnenanAAC4A4AEAAAACAAAACAA AA CAR AKCAA AR CAR ARCA A A amp ERAA AA AA Mcibm ASCII Film ennnannneananAAEAnAAACAAAACAAAAAAALAAAAAAAEAAAACAAAKEAAA amp EAA A44 Load Scout Film eaaaaAAnennAAkCAAAAEAAAACAAAACAAAA CAR AKCAA AR CAR ARCA A A amp EAA AA AAA Sawm Scout Film eaaaan nean 444 444 EAAAACAA AACAAAACAAAAEAA AR LAA A amp EAA A amp EAA AA AAA Nacgm Tho Films ennnnnnaeaanAeA An AeAAAAEAAAARCAAAAEAAAAEAA AR CARA AEAA ARKEAA AK AAA Angad Tho Filma licccueccedcateeckiacnacceecaecak ACAAAXREAA AAERACARAWAAEAXAAAAAR SAARAAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAAAEAAA
172. variate kurtosis Mardia 1970 1974 and Schwager and Margolin 1982 and the generalized distance Wilks 1963 and Barnett and Lewis 1994 both of which have desirable properties as outlier tests The maximum generalized distance is a multivariate extension of a univariate test known as Grubb s test Grubbs 1950 This test is meant to identify a single outlier It suffers from masking in the presence of multiple outliers Sequential application of this test is incorporated in Scout Mardia s multivariate kurtosis is an extension of the univariate kurtosis This test is more powerful than the generalized distance when multiple outliers are present Schwager and Margolin 1982 Mardia s multivariate kurtosis can also be used to test for deviations from multivariate normality However this statistic is also not resistant to outliers and as such may suffer from masking by multiple outliers The critical values used for the test statistic are the simulated values as given in Stapanian et al 1991 This module of Scout is based on sequential application of these tests This means that outliers are detected sequentially they are identified in the initial data set removed from the data the statistics recomputed and the identification removal and recomputing repeated until no more outliers are found Both tests assume the data are independent observations from a single multivariate normal distribution If a large proportion of the data are i
173. will return It may appear as if nothing has changed however in the lower right corner of the screen is the name of the file selected and in the lower left corner is the path taken to get to this file Scout has read the file and is now ready to analyze Scout Tutorial 9 2 Chapter 9 Tutorial it If you experiment with other files in other directories remember the ASCII files accompanying Scout end with the DAT extension and their format matches that defined in Chapter 2 Your own files may have any three character extension 9 3 Examine and Save Statistics Assuming the file IRIS DAT has been read use the arrow keys to move to the Data heading If you re in a level 2 menu or deeper you may have to use the lt ESC gt key to get back to the level 1 menu before the left and right arrow keys will function Pressing the ENTER key will give you the level 2 menu for the Data heading Move the highlighted cell cursor to the Statistics choice and press the lt ENTER gt key Your screen should now match Figure 9 2 Film Data Classical Nataod Rosust Natihnod PCA Gcagnica Sysatan I ARIM AA AA Q QS amp AAAACAAAACAAAACAAAAE AAA AE AA AA AK AA AA A AC AA AA AA AA AA AA AA aaaaaaa Edit Data ennaaA Ae AA AAEAA AREA AA ACA A AA amp ECAR ARCAR ARCAR AREAR AKEAR A amp EAA AAA aaaaaaa Statistica enn nA Ae AAAAEAA AREA AA ACA A ARCAA ARCAA ARCAA AREAR AKEAR A amp EAA AAA taiii Teanatocn 4AAAAARACAAAAEAAAAEAARAA
174. with each of these options An explanation window associated with each of the options provides a brief description of that heading or choice This Robust Method module is independent of cannot communicate with Classical Method PCA and Graphics headings in Scout It can communicate with File Data and System headings For example the Robust principal components cannot be displayed using a 3 D graph without first saving them in a data file and then reading in the saved data file to plot the 3 D graph of the saved principal components Scout User s Guide 5 2 Chapter 5 Robust Statistical Methods Film Data Classical Nathod Rosust Nataod PCA Gcagnica Syatan 4AAAAARACAAAAEAAA AG AA AA AA A AE A AA A LA a MSAAAASAAAAEAAAARAAR A4A4AAAACAAAAEAA AACEA4AAEAA A4 EAA4A4A44 Smimct Vaciaaima 4AAAAAKACAAAAEAAAAAK 4AAAAAAACAAAACAAAAEAAAACAAAACAAAAA4A44 M iwaciabm Statistica 4AAAAAACAAAACAAAAA C4AAA amp AAEAAAAEAR AACAARREAARACAAAA4A4 Rususbt Analysis POPP PPeTrerrir rrr re 44444A4ACAAAAEAA AAtEAA A amp EAA AA 64444444 Contusion Mabcix 4AAAAAAEAAAAEAAAAAR 4A4AAA4A AAAA4EAAAA6A4 A46 AA A464 444444 Pattacn Racogoitian ennannaennaneen44A4 444AAAAACAAAA amp 6AAAACAAA amp 4 CAA 4464444444 D Tcmad PEP EPETESEST TELE TS LL Rauuat Analusi3 dMARAAAAAA x xad xa Display Geagna Fac J J Plat Lindiw Raw Ostaj Perr rr errs sanaaana Stakiskica Dgtiana Classical 2ssa5224 eanaaaana l Taco Lowmc Li
175. x and y axes and a scatter plot containing only the observations of interest will appear Press the lt Z gt key and Scout will return to the original scatter plot with the white rectangle still surrounding the observations of interest Pressing ENTER key from the zoomed scatter plot will cause Scout to return to the color coded correlation matrix CAUTION You can not use the zoom feature on a scatterplot generated by the zoom feature If you wish to inspect an area of a zoomed scatter lot in detail you must first redefine the white rectangle To redefine the dimensions and location of the rectangle return to the original scatter plot and press the and KARROWS keys until the rectangle is at the desired size and location If you wish to exit the zoom mode and thus eliminate the white rectangle from the original scatter plot press lt ESC gt If you press the lt Z gt key again the Scout will restore the rectangle as it was just prior to exiting the zoom mode To return to the color coded correlation matrix from the original scatter plot exit the zoom mode and press lt ESC gt 7 6 3 Dimensional Graphs The last heading in the Graphics menu 3 Dimensional is the 3 dimensional graphics system The user first selects a variable for each of the three axes All of the variables will be displayed on the screen with the first variable highlighted The user may use the ARROW keys HOME key and END key to h

Download Pdf Manuals

image

Related Search

Related Contents

441–446. PDF. - Sociedad mexicana de Entomología  HFC-23 - 日本フルオロカーボン協会  Département de Géographie et Aménagement 2001 / 2002  Telefono ZTE F116    Artículo de Mi Vivienda, Nº 57  Speed Wheel RV  Samsung Galaxy Tab (Wi-Fi) Vartotojo vadovas  富山県グリーン購入調達方針  Produktinfo - hifisound.de  

Copyright © All rights reserved.
Failed to retrieve file