Home

IBM SPSS Advanced Statistics 21

1. Display Plot Y Frequencies E Residuals IM Residuals 7 Normal Probability Display for Saturated Model A Model Criteria Maximum iterations Convergence Delta Ga Gas a Display You can choose Frequencies Residuals or both In a saturated model the observed and expected frequencies are equal and the residuals are equal to 0 Plot For custom models you can choose one or both types of plots Residuals and Normal Probability These will help determine how well a model fits the data Display for Saturated Model For a saturated model you can choose Parameter estimates The parameter estimates may help determine which terms can be dropped from the model An association table which lists tests of partial association is also available This option is computationally expensive for tables with many factors Model Criteria An iterative proportional fitting algorithm is used to obtain parameter estimates You can override one or more of the estimation criteria by specifying Maximum iterations Convergence or Delta a value added to all cell frequencies for saturated models HILOGLINEAR Command Additional Features The command syntax language also allows you to m Specify cell weights in matrix form using the CWEIGHT subcommand m Generate analyses of several models with a single command using the DESIGN subcommand See the Command Syntax Reference for complete syntax informat
2. You can save some results of this procedure to a new IBM SPSS Statistics data file Variance component estimates Saves estimates of the variance components and estimate labels to a data file or dataset These can be used in calculating more statistics or in further analysis in the GLM procedures For example you can use them to calculate confidence intervals or test hypotheses Component covariation Saves a variance covariance matrix or a correlation matrix to a data file or dataset Available only if Maximum likelihood or Restricted maximum likelihood has been specified Destination for created values Allows you to specify a dataset name or external filename for the file containing the variance component estimates and or the matrix Datasets are available for subsequent use in the same session but are not saved as files unless explicitly saved prior to the end of the session Dataset names must conform to variable naming rules You can use the MATRIX command to extract the data you need from the data file and then compute confidence intervals or perform tests VARCOMP Command Additional Features The command syntax language also allows you to m Specify nested effects in the design using the DESIGN subcommand m Include user missing values using the MISSING subcommand m Specify EPS criteria using the CRITERIA subcommand See the Command Syntax Reference for complete syntax information Chapter Linear Mi
3. l l l l l l First Effect Last Effect 0001 0005 001 005 01 05 10 20 1 00 119 Generalized linear mixed models Figure 8 19 Fixed Coefficients view table style Fixed Coefficients Target Post test 95 Confidence Interval Coefficient Y Std Error t Sig Lower Upper Intercept 47 258 325314529 40 879 53 637 school_setting 1 1 516 2 388 0 635 526 6 198 3 167 school_setting 2 4 724 2 369 1 994 0 078 9 370 school_setting 3 0 school_type 1 4 408 2 105 2 095 8 535 0 281 teaching_method 0 6 197 0 670 9 250 n_student 0 001 0 136 0 007 994 0 266 0 268 gender 0 0 312 0 125 2 500 lunch 1 1 698 0 219 7 737 2 129 1 268 Probability distribution Normal Link function Identity This coefficient is set to zero because it is redundant First Effect Last Effect Display coefficients with sig values up to 0001 0005 001 005 01 05 10 20 1 00 120 Chapter 8 This view displays the value of each fixed coefficient in the model Note that factors categorical predictors are indicator coded within the model so that effects containing factors will generally have multiple associated coefficients one for each category except the category corresponding to the redundant coefficient Styles There are different display styles which are accessible from the Style dropdown list m Diagram This is a chart which displays the intercept first and then sorts effects from top to
4. rLog Likelihood Convergence Absolute Relative Value rParameter Convergence Absolute Relative Value pHessian Convergence Absolute Relative value Maximum scoring steps 1 Method Select the maximum likelihood or restricted maximum likelihood estimation Iterations m Maximum iterations Specify a non negative integer m Maximum step halvings At cach iteration the step size is reduced by a factor of 0 5 until the log likelihood increases or maximum step halving is reached Specify a positive integer m Print iteration history for every n step s Displays a table containing the log likelihood function value and parameter estimates at every n iteration beginning with the Oth iteration the initial estimates If you choose to print the iteration history the last iteration is always printed regardless of the value of n Log likelihood Convergence Convergence is assumed if the absolute change or relative change in the log likelihood function is less than the value specified which must be non negative The criterion is not used if the value specified equals 0 Parameter Convergence Convergence is assumed if the maximum absolute change or maximum relative change in the parameter estimates is less than the value specified which must be non negative The criterion is not used if the value specified equals 0 42 Chapter 5 Hessian Convergence For the Absolute specificat
5. Specify Model A saturated model contains all factor main effects and all factor by factor interactions Select Custom to specify a generating class for an unsaturated model Generating Class A generating class is a list of the highest order terms in which factors appear A hierarchical model contains the terms that define the generating class and all lower order relatives Suppose you select variables A B and C in the Factors list and then select Interaction from the Build Terms drop down list The resulting model will contain the specified 3 way interaction A B C the 2 way interactions 4 B A C and B C and main effects for A B and C Do not specify the lower order relatives in the generating class Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables 127 Model Selection Loglinear Analysis Model Selection Loglinear Analysis Options Figure 9 4 Loglinear Analysis Options dialog box E Loglinear Analysis Options
6. Within Y Include intercept in model CoC eeste Beset cancel re Specify Model Effects The default model is intercept only so you must explicitly specify other model effects Alternatively you can build nested or non nested terms Non Nested Terms For the selected factors and covariates Main effects Creates a main effects term for each variable selected Interaction Creates the highest level interaction term for all selected variables Factorial Creates all possible interactions and main effects of the selected variables All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables 80 Chapter 7 All 5 way Creates all possible five way interactions of the selected variables Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects or add multiple levels of n
7. 114 Chapter 8 Predicted by Observed Figure 8 14 Predicted by Observed view Predicted by Observed Target Post test Count 120 100 Oso 60 O40 O20 Oo Predicted Value 40 60 80 100 Post test For continuous targets including targets specified as events trials this displays a binned scatterplot of the predicted values on the vertical axis by the observed values on the horizontal axis Ideally the points should lie on a 45 degree line this view can tell you whether any records are predicted particularly badly by the model 115 Generalized linear mixed models Classification Figure 8 15 Classification view Classification Target Service usage Overall Percent Correct 85 2 Row Percent Noservice Other provider Service with company E 100 00 E 80 00 No service 60 6 13 0 26 4 napa 20 00 Other provider 1 5 86 8 11 7 0 00 Service with company 0 8 6 7 92 5 For categorical targets this displays the cross classification of observed versus predicted values in a heat map plus the overall percent correct Table styles There are several different display styles which are accessible from the Style dropdown list Row percents This displays the row percentages the cell counts expressed as a percent of the row totals in the cells This is the default Cell counts This displays the cell counts in the cells The shading for the heat map is still based on the row percentages
8. 83 Generalized Estimating Equations are used as the starting point for the initial generalized linear model not the generalized estimating equations unless the Maximum iterations on the Estimation tab is set to 0 Figure 7 9 Generalized Estimating Equations Initial Values dialog box E Generalized Estimating Equations Initial Values pSource of Initial Values O File File Name Browse cetu cone itemi If initial values are specified they must be supplied for all parameters including redundant parameters in the model In the dataset the ordering of variables from left to right must be RowType_ VarName Pl P2 where RowType_ and WarName are string variables and PZ P2 are numeric variables corresponding to an ordered list of the parameters m Initial values are supplied on a record with value EST for variable RowType_ the actual initial values are given under variables P P2 The procedure ignores all records for which RowType_ has a value other than EST as well as any records beyond the first occurrence of RowType_ equal to EST m The intercept if included in the model or threshold parameters if the response has a multinomial distribution must be the first initial values listed m The scale parameter and if the response has a negative binomial distribution the negative binomial parameter must be the last initial values specified m If Split File is in
9. E Include exponential parameter estimates Covariance matrix for parameter estimates Correlation matrix for parameter estimates T Working correlation matrix Ca eese et Gala Model Effects m Analysis type Specify the type of analysis to produce for testing model effects Type I analysis is generally appropriate when you have a priori reasons for ordering predictors in the model while Type III is more generally applicable Wald or generalized score statistics are computed based upon the selection in the Chi Square Statistics group m Confidence intervals Specify a confidence level greater than 50 and less than 100 Wald intervals are always produced regardless of the type of chi square statistics selected and are based on the assumption that parameters have an asymptotic normal distribution m Log quasi likelihood function This controls the display format of the log quasi likelihood function The full function includes an additional term that is constant with respect to the parameter estimates it has no effect on parameter estimation and is left out of the display in some software products Print The following output is available 85 Generalized Estimating Equations Case processing summary Displays the number and percentage of cases included and excluded from the analysis and the Correlated Data Summary table Descriptive statistics Displays descriptive statistics and summary information about the depen
10. Heat map This displays no values in the cells just the shading Compressed This displays no row or column headings or values in the cells It can be useful when the target has a lot of categories Missing If any records have missing values on the target they are displayed in a Missing row under all valid rows Records with missing values do not contribute to the overall percent correct Multiple targets If there are multiple categorical targets then each target is displayed in a separate table and there is a Target dropdown list that controls which target to display Large tables If the displayed target has more than 100 categories no table is displayed 116 Chapter 8 Fixed Effects Figure 8 16 Fixed Effects view diagram style Fixed Effects Target Post test school_setting amp school_type S CT I n_student L lunch amp pretest E Display effects with sig values up to 1 First Effect Last Effect 0001 0005 001 005 01 05 10 20 1 00 117 Generalized linear mixed models Figure 8 17 Fixed Effects view table style Fixed Effects Target Post test Corrected Model Y 138 911 school setting 3 989 school type 4 387 teaching method 85 564 n_student 0 000 y 994 gender 59 859 668 316 Probability distribution Normal Link function Identity First Effect Last Effect Display effects with sig values up to 0001 0005 001 005 01 05 10
11. IBM SPSS Advanced Statistics 21 CT AG Note Before using this information and the product it supports read the general information under Notices on p 166 This edition applies to IBM SPSSR Statistics 21 and to all subsequent releases and modifications until otherwise indicated in new editions Adobe product screenshot s reprinted with permission from Adobe Systems Incorporated Microsoft product screenshot s reprinted with permission from Microsoft Corporation Licensed Materials Property of IBM Copyright IBM Corporation 1989 2012 U S Government Users Restricted Rights Use duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp Preface IBM SPSSR Statistics is a comprehensive system for analyzing data The Advanced Statistics optional add on module provides the additional analytic techniques described in this manual The Advanced Statistics add on module must be used with the SPSS Statistics Core system and is completely integrated into that system About IBM Business Analytics IBM Business Analytics software delivers complete consistent and accurate information that decision makers trust to improve business performance A comprehensive portfolio of business intelligence predictive analytics financial performance and strategy management and analytic applications provides clear immediate and actionable insights into current performance and the ability to predict future outcomes Co
12. Select an item Estimated Means Save to the dataset Save Fields Y Predicted values Field Name PredictedValue E Predicted probability for categorical targets Ab E Confidence intervals Pearson residual E Confidences E Export model The exported model is a collection of one or more xml files Checked items are saved with the specified name conflicts with existing field names are not allowed Predicted values Saves the predicted value of the target The default field name is PredictedValue Predicted probability for categorical targets If the target is categorical this keyword saves the predicted probabilities of the first n categories up to the value specified as the Maximum categories to save The calculated values are cumulative probabilities for ordinal targets The default root name is PredictedProbability To save the predicted probability of the predicted category save the confidence see below Confidence intervals Saves upper and lower bounds of the confidence interval for the predicted value or predicted probability For all distributions except the multinomial this creates two variables and the default root name is C with Lower and Upper as the suffixes 111 Generalized linear mixed models For the multinomial distribution and a nominal target one field is created for each dependent variable category This saves the lower and upper bounds of the predicted probability for
13. s GT2 Gabriel s test and Scheff s test are both multiple comparison tests and range tests 24 Chapter 3 GLM Repeated Measures Save Figure 3 8 Repeated Measures Save dialog box EH Repeated Measures Save Predicted Values rResiduals E Unstandardized E Unstandardized Standard error F Standardized L Studentized Diagnostics A Deleted 7 Cook s distance E Leverage values rCoefficient Statistics Create coefficient statistics Create a new dataset Dataset name Write a new data file EEN You can save values predicted by the model residuals and related measures as new variables in the Data Editor Many of these variables can be used for examining assumptions about the data To save the values for use in another IBM SPSS Statistics session you must save the current data file Predicted Values The values that the model predicts for each case m Unstandardized The value the model predicts for the dependent variable m Standard error An estimate of the standard deviation of the average value of the dependent variable for cases that have the same values of the independent variables Diagnostics Measures to identify cases with unusual combinations of values for the independent variables and cases that may have a large impact on the model Available are Cook s distance and uncentered leverage values m Cook s distance A measure of how
14. Cell Weights E Owens fax machine ow gt Model Building Use backward elimination Maximum steps Probability for removal 05 Enter in single step Do Get Ge Cit Select two or more numeric categorical factors gt Select one or more factor variables in the Factor s list and click Define Range b Define the range of values for each factor variable Select an option in the Model Building group Optionally you can select a cell weight variable to specify structural zeros Loglinear Analysis Define Range Figure 9 2 Loglinear Analysis Define Range dialog box E Loglinear Analysis Define Range Minimum Maximum la EEN You must indicate the range of categories for each factor variable Values for Minimum and Maximum correspond to the lowest and highest categories of the factor variable Both values must be integers and the minimum value must be less than the maximum value Cases with values outside of the bounds are excluded For example if you specify a minimum value of 1 and a maximum value of 3 only the values 1 2 and 3 are used Repeat this process for each factor variable 126 Chapter 9 Loglinear Analysis Model Figure 9 3 Loglinear Analysis Model dialog box E Loglinear Analysis Model rSpecity Model Saturated Custom Factors Generating Class inccat news response inccat news response Build Term s
15. Computing Time Dependent Covariates There are certain situations in which you would want to compute a Cox Regression model but the proportional hazards assumption does not hold That is hazard ratios change across time the values of one or more of your covarlates are different at different time points In such cases you need to use an extended Cox Regression model which allows you to specify time dependent covariates In order to analyze such a model you must first define your time dependent covariate Multiple time dependent covariates can be specified using command syntax To facilitate this a system variable representing time is available This variable is called 7 You can use this variable to define time dependent covariates in two general ways m Ifyou want to test the proportional hazards assumption with respect to a particular covariate or estimate an extended Cox regression model that allows nonproportional hazards you can do so by defining your time dependent covariate as a function of the time variable T_ and the covariate in question A common example would be the simple product of the time variable and the covariate but more complex functions can be specified as well Testing the significance of the coefficient of the time dependent covariate will tell you whether the proportional hazards assumption is reasonable m Some variables may have different values at different time periods but aren t systematically related to time
16. Covariance Parameters view Covariance Parameters Target Post test Residual Effect 1 Covariance Parameters Random Effects 2 Fixed Effects 14 A a Design Matrix Columns Random Effects 7a Common Subjects 23 Common subjects are based on the subject specifications for the residual and random effects and are used to chunk the data for better performance This is the number of columns per common subject 95 Confidence Interval Estimate Std Error Lower Upper Variance 7 833 0 248 7 362 8 334 Covariance Structure Scaled Identity Subject Specification None This view displays the covariance parameter estimates and related statistics for residual and random effects These are advanced but fundamental results that provide information on whether the covariance structure is suitable Summary table This is a quick reference for the number of parameters in the residual R and random effect G covariance matrices the rank number of columns in the fixed effect X and random effect Z design matrices and the number of subjects defined by the subject fields that define the data structure 122 Chapter 8 Covariance parameter table For the selected effect the estimate standard error and confidence interval are displayed for each covariance parameter The number of parameters shown depends upon the covariance structure for the effect and for random effect blocks the number of effects in the block If you
17. Descending Use data order The last unique category may be associated with a redundant parameter in the estimation algorithm EC These options are applied to all factors specified on the Predictors tab User Missing Values Factors must have valid values for a case to be included in the analysis These controls allow you to decide whether user missing values are treated as valid among factor variables Category Order This is relevant for determining a factor s last level which may be associated with a redundant parameter in the estimation algorithm Changing the category order can change the values of factor level effects since these parameter estimates are calculated relative to the last level Factors can be sorted in ascending order from lowest to highest value in descending order from highest to lowest value or in data order This means that the first value encountered in the data defines the first category and the last unique value encountered defines the last category 79 Generalized Estimating Equations Generalized Estimating Equations Model Figure 7 7 Generalized Estimating Equations Model tab EN Generalized Estimating Equations E Repeated Type of Model Response Predictors Model Estmation Statistics EM Means Save Export rSpecity Model Effects Factors and Covariates Model type construction operation Number of Effects in Model 3 Build Nested Term Term
18. Duncan Tukey Tukey s b E Dunnett R E G Ww F T Hochberg s GT2 Test E R E G vv 7 Gabriel 2 side Equal Variances Not Assumed BNG Tamhane s 72 ll Dunnett s 13 MN Games Howel I ESA Post hoc multiple comparison tests Once you have determined that differences exist among the means post hoc range tests and pairwise multiple comparisons can determine which means differ Comparisons are made on unadjusted values The post hoc tests are performed for each dependent variable separately The Bonferroni and Tukey s honestly significant difference tests are commonly used multiple comparison tests The Bonferroni test based on Student s statistic adjusts the observed significance level for the fact that multiple comparisons are made Sidak s t test also adjusts the significance level and provides tighter bounds than the Bonferroni test Tukey s honestly significant difference test uses the Studentized range statistic to make all pairwise comparisons between groups and sets the experimentwise error rate to the error rate for the collection for 9 GLM Multivariate Analysis all pairwise comparisons When testing a large number of pairs of means Tukey s honestly significant difference test is more powerful than the Bonferroni test For a small number of pairs Bonferroni is more powerful Hochberg s GT 2 is similar to Tukey s honestly significant difference test but the Stud
19. IM Frequencies Residuals M E Design matrix Estimates E Iteration history Confidence Interval Criteria Maximum iterations Convergence 0001 EE The General Loglinear Analysis procedure displays model information and goodness of fit statistics In addition you can choose one or more of the following Display Several statistics are available for display observed and expected cell frequencies raw adjusted and deviance residuals a design matrix of the model and parameter estimates for the model Plot Plots available for custom models only include two scatterplot matrices adjusted residuals or deviance residuals against observed and expected cell counts You can also display normal probability and detrended normal plots of adjusted residuals or deviance residuals Confidence Interval The confidence interval for parameter estimates can be adjusted Criteria The Newton Raphson method is used to obtain maximum likelihood parameter estimates You can enter new values for the maximum number of iterations the convergence criterion and delta a constant added to all cells for initial approximations Delta remains in the cells for saturated models 132 Chapter 10 General Loglinear Analysis Save Figure 10 4 General Loglinear Analysis Save dialog box E General Loglinear Analysis Save Residuals E Standardized residuals T Adjusted residuals o De
20. In such cases you need to define a segmented time dependent covariate which can be done using logical expressions Logical expressions take the value 1 if true and 0 if false Using a series of logical expressions you can create your time dependent covariate from a set of measurements For example if you have blood pressure measured once a week for the four weeks of your study identified as BP to BP4 you can define your time dependent covariate as T_ lt 1 BP1 T_ gt 1 amp T7_ lt 2 BP2 T_ gt 2 amp T_ lt 3 BP3 T_ gt 3 amp T_ lt 4 BP4 Notice that exactly one of the terms in parentheses will be equal to 1 for any given case and the rest will all equal 0 In other words this function means that if time is less than one week use BP if it is more than one week but less than two weeks use BP2 and so on In the Compute Time Dependent Covariate dialog box you can use the function building controls to build the expression for the time dependent covariate or you can enter it directly in the Expression for T COV text area Note that string constants must be enclosed in quotation marks or apostrophes and numeric constants must be typed in American format with the dot as the decimal delimiter The resulting variable is called T COV and should be included as a covariate in your Cox Regression model Computing a Time Dependent Covariate b From the menus choose Analyze gt Survival gt Cox w Time Dep Cov Copyright IBM Co
21. This is appropriate for a continuous target whose values take a symmetric bell shaped distribution about a central mean value Poisson This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non negative integer values If a data value is non integer less than 0 or missing then the corresponding case is not used in the analysis Link Functions The link function is a transformation of the target that allows estimation of the model The following functions are available Identity f x x The target is not transformed This link can be used with any distribution except the multinomial Complementary log log f x log log 1 x This is appropriate only with the binomial or multinomial distribution Cauchit f x tan r x 0 5 This is appropriate only with the binomial or multinomial distribution Log f x log x This link can be used with any distribution except the multinomial Log complement x log 1 x This is appropriate only with the binomial distribution Logit f x log x 1 x This is appropriate only with the binomial or multinomial distribution Negative log log f x log log x This is appropriate only with the binomial or multinomial distribution 100 Chapter 8 m Probit x x where 07 is the inverse standard normal cumulative distribution function This is appropriate only w
22. b In the Logit Loglinear Analysis dialog box select one or more dependent variables gt Select one or more factor variables The total number of dependent and factor variables must be less than or equal to 10 Optionally you can m Select cell covariates m Select a cell structure variable to define structural zeros or include an offset term m Select one or more contrast variables 135 Logit Loglinear Analysis Logit Loglinear Analysis Model Figure 11 2 Logit Loglinear Analysis Model dialog box EH Logit Loglinear Analysis Model rSpecify Model Saturated Custom Factors amp Covariates Terms in Model ll active pul gender ul agecat E4 Include constant for dependent Gana ama Cree Specify Model A saturated model contains all main effects and interactions involving factor variables It does not contain covariate terms Select Custom to specify only a subset of interactions or to specify factor by covariate interactions Factors amp Covariates The factors and covariates are listed Terms in Model The model depends on the nature of your data After selecting Custom you can select the main effects and interactions that are of interest in your analysis You must indicate all of the terms to be included in the model Terms are added to the design by taking all possible combinations of the dependent terms and matching each combination with each term in the model
23. bottom in the order in which they were specified on the Fixed Effects settings Within effects containing factors coefficients are sorted by ascending order of data values Connecting lines in the diagram are colored and weighted based on coefficient significance with greater line width corresponding to more significant coefficients smaller p values This is the default style m Table This shows the values significance tests and confidence intervals for the individual model coefficients After the intercept the effects are sorted from top to bottom in the order in which they were specified on the Fixed Effects settings Within effects containing factors coefficients are sorted by ascending order of data values Multinomial If the multinomial distribution is in effect then the Multinomial drop down list controls which target category to display The sort order of the values in the list is determined by the specification on the Build Options settings Exponential This displays exponential coefficient estimates and confidence intervals for certain model types including Binary logistic regression binomial distribution and logit link Nominal logistic regression multinomial distribution and logit link Negative binomial regression negative binomial distribution and log link and Log linear model Poisson distribution and log link Significance There is a Significance slider that controls which coefficients are shown in the view Coe
24. log 1 x This is appropriate only with the binomial distribution Cumulative Cauchit fx tan x x 0 5 applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative complementary log log x In In 1 x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative logit x In x 1 x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative negative log log x In In x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative probit f x x applied to the cumulative probability of each category of the response where 07 is the inverse standard normal cumulative distribution function This is appropriate only with the multinomial distribution Log f x log x This link can be used with any distribution Log complement x log 1 x This is appropriate only with the binomial distribution Logit f x log x 1 x This is appropriate only with the binomial distribution Negative binomial f x log x x k71 where k is the ancillary parameter of the negative binomial distribution This is appropriate only with the negative binomial distribution 50 Chapt
25. 113 estimated marginal means 108 estimated means 122 fixed coefficients 118 171 fixed effects 100 116 link function 97 model export 110 model summary 112 model view 111 offset 106 predicted by observed 114 random effect block 104 random effect covariances 120 random effects 103 save fields 110 target distribution 97 generalized linear model in generalized linear mixed models 93 Generalized Linear Models 46 distribution 46 estimated marginal means 61 estimation criteria 56 initial values 58 link function 46 model export 65 model specification 54 model types 46 options for categorical factors 53 predictors 52 reference category for binary response 51 response 50 save variables to active dataset 63 statistics 59 generalized log odds ratio in General Loglinear Analysis 128 generating class in Model Selection Loglinear Analysis 126 GLM saving matrices 10 saving variables 10 GLM Multivariate 2 12 covariates 2 dependent variable 2 diagnostics 11 display 11 estimated marginal means 11 factors 2 options 11 post hoc tests 8 profile plots 7 GLM Repeated Measures 14 command additional features 26 define factors 17 diagnostics 25 display 25 estimated marginal means 25 model 18 options 25 post hoc tests 22 profile plots 21 saving variables 24 Index GLOR in General Loglinear Analysis 128 goodness of fit in Generalized Estimating Equations 84 in Generaliz
26. 60 least significant difference in GLM Multivariate 8 in GLM Repeated Measures 22 legal notices 166 Levene test in GLM Multivariate 11 in GLM Repeated Measures 25 leverage values in Generalized Linear Models 64 in GLM 10 in GLM Repeated Measures 24 Life Tables 139 command additional features 142 comparing factor levels 141 example 139 factor variables 141 hazard rate 139 plots 141 statistics 139 suppressing table display 141 survival function 139 survival status variables 141 Wilcoxon Gehan test 141 likelihood residuals in Generalized Linear Models 64 Linear Mixed Models 34 162 build terms 37 38 command additional features 45 covariance structure 162 estimated marginal means 43 estimation criteria 41 fixed effects 37 interaction terms 37 model 42 random effects 39 save variables 44 link function generalized linear mixed models 97 log complement link function in generalized estimating equations 74 in generalized linear models 49 log link function in generalized estimating equations 74 in generalized linear models 49 log rank test in Kaplan Meier 145 log likelihood convergence in Generalized Estimating Equations 81 in Generalized Linear Models 56 in Linear Mixed Models 41 logistic regression generalized linear mixed models 93 logit link function in generalized estimating equations 74 in generalized linear models 49 Logit Loglinear
27. 89 in Generalized Linear Models 64 in Linear Mixed Models 44 in Logit Loglinear Analysis 137 in Model Selection Loglinear Analysis 127 174 Index restricted maximum likelihood estimation in Variance Components 31 Ryan Einot Gabriel Welsch multiple F in GLM Multivariate 8 in GLM Repeated Measures 22 Ryan Einot Gabriel Welsch multiple range in GLM Multivariate 8 in GLM Repeated Measures 22 saturated models in Model Selection Loglinear Analysis 126 scale parameter in Generalized Estimating Equations 81 in Generalized Linear Models 56 Scheff test in GLM Multivariate 8 in GLM Repeated Measures 22 scoring in Linear Mixed Models 41 segmented time dependent covariates in Cox Regression 155 separation in Generalized Estimating Equations 81 in Generalized Linear Models 56 Sidak s test in GLM Multivariate 8 in GLM Repeated Measures 22 singularity tolerance in Linear Mixed Models 41 spread versus level plots in GLM Multivariate 11 in GLM Repeated Measures 25 SSCP in GLM Multivariate 11 in GLM Repeated Measures 25 standard deviation in GLM Multivariate 11 in GLM Repeated Measures 25 standard error in GLM 10 in GLM Multivariate 11 in GLM Repeated Measures 24 25 standardized residuals in GLM 10 in GLM Repeated Measures 24 step halving in Generalized Estimating Equations 81 in Generalized Linear Models 56 in Linear Mixed Models 41 string covariates in Cox
28. A Categorical Variable Coding Schemes In many procedures you can request automatic replacement of a categorical independent variable with a set of contrast variables which will then be entered or removed from an equation as a block You can specify how the set of contrast variables is to be coded usually on the CONTRAST subcommand This appendix explains and illustrates how different contrast types requested on CONTRAST actually work Deviation Deviation from the grand mean In matrix terms these contrasts have the form mean Uk Uk ng Uk dil 1 1 k Uk gt Uk df 2 1 k 1 1 k ak Uk df k 1 Uk Uk ni 1 1 k Uk Uk Uk Uk where k 1s the number of categories for the independent variable and the last category is omitted by default For example the deviation contrasts for an independent variable with three categories are as follows 1 3 1 3 1 3 2 3 1 3 1 3 1 3 2 3 1 3 To omit a category other than the last specify the number of the omitted category in parentheses after the DEVIATION keyword For example the following subcommand obtains the deviations for the first and third categories and omits the second CONTRAST FACTOR DEVIATION 2 Suppose that factor has three categories The resulting contrast matrix will be 1 3 1 3 1 3 2 3 1 3 1 3 213 1 3 2 3 O Copyright IBM Corporation 1989 2012 157 158 Appendix A Simple Simple contrasts Compares each level of
29. Analysis 133 cell covariates 133 cell structures 133 confidence intervals 136 contrasts 133 criteria 136 display options 136 distribution of cell counts 133 factors 133 model specification 135 plots 136 predicted values 137 residuals 137 saving variables 137 loglinear analysis 124 General Loglinear Analysis 128 in generalized linear mixed models 93 Logit Loglinear Analysis 133 longitudinal models generalized linear mixed models 93 Mauchly s test of sphericity in GLM Repeated Measures 25 maximum likelihood estimation in Variance Components 31 MINQUE in Variance Components 31 mixed models generalized linear mixed models 93 linear 34 model information in Generalized Estimating Equations 84 in Generalized Linear Models 60 Model Selection Loglinear Analysis 124 command additional features 127 defining factor ranges 125 models 126 options 127 model view in generalized linear mixed models 111 multilevel models generalized linear mixed models 93 multinomial distribution in generalized estimating equations 73 in generalized linear models 48 multinomial logistic regression generalized linear mixed models 93 multinomial logit models 133 multivariate ANOVA 2 multivariate GLM 2 multivariate regression 2 173 negative binomial distribution in generalized estimating equations 73 in generalized linear models 48 negative binomial link function in generalized estimati
30. Equations Repeated Type of Medel Response Predictors Model Estimation Statistics EM Means Save Export Item to Save Variable Name or Root Name Categories to Save Predicted value of mean of response MeanPredicted Lower bound of confidence interval for mean of response CllMeanPredictedLowver Upper bound of confidence interval for mean of response CIMeanPredictedUpper Predicted category Pre Value Predicted value of linear predictor Estimated standard error of predicted value of linear predictor Residual Pearson residual Existing Variable with Same Name Add suffix to name of new variable applies only to default names Replace existing variable applies to both default and user provided names If you provide your own variable names make sure that they do not conflict with existing variables in the active dataset Select the Replace Existing Variable option if you want to overwrite existing variables with the same user provided name Cow eeste eset conce He Checked items are saved with the specified name you can choose to overwrite existing variables with the same name as the new variables or avoid name conflicts by appendix suffixes to make the new variable names unique m Predicted value of mean of response Saves model predicted values for each case in the original response metric When the response distribution is binomial and the dependent variable is binary the procedure
31. G squared where the sign is the sign of the residual observed count minus expected count Deviance residuals have an asymptotic standard normal distribution GENLOG Command Additional Features The command syntax language also allows you to m Calculate linear combinations of observed cell frequencies and expected cell frequencies and print residuals standardized residuals and adjusted residuals of that combination using the GERESID subcommand 138 Chapter 11 m Change the default threshold value for redundancy checking using the CRITERIA subcommand m Display the standardized residuals using the PRINT subcommand See the Command Syntax Reference for complete syntax information Chapter Life Tables There are many situations in which you would want to examine the distribution of times between two events such as length of employment time between being hired and leaving the company However this kind of data usually includes some cases for which the second event isn t recorded for example people still working for the company at the end of the study This can happen for several reasons for some cases the event simply doesn t occur before the end of the study for other cases we lose track of their status sometime before the end of the study still other cases may be unable to continue for reasons unrelated to the study such as an employee becoming ill and taking a leave of absence Collectively such cases
32. If your covariates can have different values at different points in time for the same case use Cox Regression with Time Dependent Covariates Obtaining a Kaplan Meier Survival Analysis gt From the menus choose Analyze gt Survival gt Kaplan Meier Copyright IBM Corporation 1989 2012 143 144 Chapter 13 Figure 13 1 Kaplan Meier dialog box EN Kaplan Meier E Fire rena amp Gender gender Status al Dosage dosage dl General heatth health KS Options Le Factor TT Strata DOO O FI Label Cases by gt Select a time variable gt Select a status variable to identify cases for which the terminal event has occurred This variable can be numeric or short string Then click Define Event Optionally you can select a factor variable to examine group differences You can also select a strata variable which will produce separate analyses for each level stratum of the variable Kaplan Meier Define Event for Status Variable Figure 13 2 Kaplan Meier Define Event for Status Variable dialog box ii Kaplan Meier Define Event For Status Variable E Walue s indicating event has occurred O Range of values through O List of values Enter the value or values indicating that the terminal event has occurred You can enter a single value a range of values or a list of values The Range of Values option is available only if your status variable is numeric 145 Kaplan Mei
33. Model Information to XML File e Browse You can save various results of your analysis as new variables These variables can then be used in subsequent analyses to test hypotheses or to check assumptions Save Model Variables Allows you to save the survival function and its standard error log minus log estimates hazard function partial residuals DfBeta s for the regression and the linear predictor X Beta as new variables Survival function The value of the cumulative survival function for a given time It equals the probability of survival to that time period Log minus log survival function The cumulative survival estimate after the In In transformation is applied to the estimate Hazard function Saves the cumulative hazard function estimate also called the Cox Snell residual Partial residuals You can plot partial residuals against survival time to test the proportional hazards assumption One variable is saved for each covariate in the final model Parital residuals are available only for models containing at least one covariate DfBeta s Estimated change in a coefficient if a case is removed One variable is saved for each covariate in the final model DfBetas are only available for models containing at least one covariate X Beta Linear predictor score The sum of the product of mean centered covariate values and their corresponding parameter estimates for each case If you are running Cox with
34. Sd de Each row except the means row sums to 0 Products of each pair of disjoint rows sum to 0 as well Rows 2 and 3 3 0 1 2 N61 N61 0 Rows 2 and 4 3 0 EDO EDO ENE 0 Rows 3 and 4 0 0 2 0 EDO 1 1 0 The special contrasts need not be orthogonal However they must not be linear combinations of each other If they are the procedure reports the linear dependency and ceases processing Helmert difference and polynomial contrasts are all orthogonal contrasts Indicator Indicator variable coding Also known as dummy coding this is not available in LOGLINEAR or MANOVA The number of new variables coded is k 1 Cases in the reference category are coded 0 for all k 1 variables A case in the ith category is coded 0 for all indicator variables except the ith which is coded 1 Appendix Covariance Structures This section provides additional information on covariance structures Ante Dependence First Order This covariance structure has heterogenous variances and heterogenous correlations between adjacent elements The correlation between two nonadjacent elements is the product of the correlations between the elements that lie between the elements of interest o 0201P1 0301P1P2 7401P1P2P3 2 0201P1 05 0302P2 0402P2P3 2 0301P1P2 0302 2 03 0403P3 2 0401P1P2P3 T402P2P3 0403P3 04 AR 1 This is a first order autoregressive structure with homogenous variances The corr
35. The dependent variable must be numeric Poisson This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non negative integer values If a data value is non integer less than 0 or missing then the corresponding case is not used in the analysis Tweedie This distribution is appropriate for variables that can be represented by Poisson mixtures of gamma distributions the distribution is mixed in the sense that it combines properties of continuous takes non negative real values and discrete distributions positive probability mass at a single value 0 The dependent variable must be numeric with data values greater than or equal to zero If a data value is less than zero or missing then the corresponding case is not used in the analysis The fixed value of the Tweedie distribution s parameter can be any number greater than one and less than two Multinomial This distribution is appropriate for variables that represent an ordinal response The dependent variable can be numeric or string and it must have at least two distinct valid data values Link Functions The link function is a transformation of the dependent variable that allows estimation of the model The following functions are available Identity x x The dependent variable is not transformed This link can be used with any distribution Complementary log log x log
36. a factor to the last The general matrix form is mean Uk Uk Uk Ik df 1 1 0 0 1 df 2 0 1 0 1 df k 1 0 0 Lol where k is the number of categories for the independent variable For example the simple contrasts for an independent variable with four categories are as follows 1 4 1 4 1 4 1 4 1 0 0 1 0 1 0 1 0 0 1 1 To use another category instead of the last as a reference category specify in parentheses after the SIMPLE keyword the sequence number of the reference category which is not necessarily the value associated with that category For example the following CONTRAST subcommand obtains a contrast matrix that omits the second category CONTRAST FACTOR SIMPLE 2 Suppose that factor has four categories The resulting contrast matrix will be 1 4 1 4 1 4 1 4 1 i 0 0 0 4 1 0 0 Si 0 1 Helmert Helmert contrasts Compares categories of an independent variable with the mean of the subsequent categories The general matrix form is mean Uk Uk Uk Uk df 1 1 1 k 1 1 1 1 1 df 2 0 1 1 k 2 1 02 d 0 0 1 12 SO df k 1 0 0 1 1 159 Categorical Variable Coding Schemes where k is the number of categories of the independent variable For example an independent variable with four categories has a Helmert contrast matrix of the following form 1 4 1 4 1 4 1 4 1 4 1 3 1 3 0 1 Ale 1 2 0 0 1 1 Difference Difference or reverse Helmert contra
37. a time dependent covariate DfBeta s and the linear predictor variable X Beta are the only variables you can save Export Model Information to XML File Parameter estimates are exported to the specified file in XML format You can use this model file to apply the model information to other data files for scoring purposes 153 Cox Regression Analysis Cox Regression Options Figure 14 5 Cox Regression Options dialog box FA Cox Regression Options Model Statistics Probability for Stepwise Entry Removal E Cl for exp B 55 E Correlation of estimates Display model information O at each step O At last step Maximum Iterations F Display baseline function You can control various aspects of your analysis and output Model Statistics You can obtain statistics for your model parameters including confidence intervals for exp B and correlation of estimates You can request these statistics either at each step or at the last step only Probability for Stepwise If you have selected a stepwise method you can specify the probability for either entry or removal from the model A variable is entered if the significance level of its F to enter is less than the Entry value and a variable is removed if the significance level is greater than the Removal value The Entry value must be less than the Removal value Maximum Iterations Allows you to specify the maximum iterations fo
38. ace aaa dma WANG Guano eel a ake Rane a mane aves 107 Estimated Means aa deed 25 a NTN a else og et oR deb ae cn Deeds eee 108 PA 110 Model View nc cde echt clad de DAL hal ha aed BAD eed gars luda DRE ae 111 Model Summary seita sea ie di NAA da Dn KAN a kd dd aA 112 Data Structure 2 LG Mina a ABE 113 Predicted by Observed reiser tasare aaae a E a aie DADA Aa Ea 114 ClassINGaton sincs a eaaa ionia a ia da a D a 115 Fixed EMO GUS aka na a aa a a KAT a ae ABANTE a a a i 116 Fixed Coefficients nnana nanana 118 Random Effect Covariances nananana 120 Covariance Parameters 0 ccc eee eee een ene E a 121 Estimated Means Significant Effects 0 0 ccc cee eee teen ees 122 Estimated Means Custom Effects 0 0 0 0 0c cece cee eee teen eens 122 9 Model Selection Loglinear Analysis 124 Loglinear Analysis Define Range ooococccccccc teens 125 vii Loglinear Analysis Model 000 cece eee teen etna 126 Bulld TerMs ma a Scere OS ee E ea EE ping NAB Richa A ae a 126 Model Selection Loglinear Analysis Options 0 00 c cece eee eee eee 127 HILOGLINEAR Command Additional Features 0 0 ce eee eee 127 10 General Loglinear Analysis 128 General Loglinear Analysis Model 000 0 ccc eee eee eee eee 130 Build Term S sgae KAG AL e PANO AUR po nian area Pin AD 130 General Loglinear Analysis Options 000 cece eee 131 General Loglinear Analysis Sa
39. as the distribution and Log as the link function m Tweedie with identity link Specifies Tweedie as the distribution and Identity as the link function Custom Specify your own combination of distribution and link function Distribution This selection specifies the distribution of the dependent variable The ability to specify a non normal distribution and non identity link function is the essential improvement of the generalized linear model over the general linear model There are many possible distribution link function combinations and several may be appropriate for any given dataset so your choice can be guided by a priori theoretical considerations or which combination seems to fit best m Binomial This distribution is appropriate only for variables that represent a binary response or number of events m Gamma This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis m Inverse Gaussian This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis m Negative binomial This distribution can be thought of as the number of trials required to observe k successes and is appropriate for var
40. correlation matrix that represents the within subject dependencies is estimated as part of the model Obtaining Generalized Estimating Equations From the menus choose Analyze gt Generalized Linear Models gt Generalized Estimating Equations Copyright IBM Corporation 1989 2012 68 69 Figure 7 1 Generalized Estimating Equations Generalized Estimating Equations Repeated tab Variables E Aggregate months of servi Logarithm of aggregate mo Working Correlation Matrix E Generalized Estimating Equations E Repeated Type of Model Response Predictors Model Estimation Statistics EMMeans Save Export Subject variables Ship type type oy subject and within subject variables Covariance Matrix Robust estimator Model based estimator Y Adjust estimator by number of non redundant parameters Maximum iterations 100 Y Update matrix iterations between updates Convergence Criteria At least one convergence criterion must be specified with a minimum greater than zero Minimum Type Y Change in parameter estimates E Hessian convergence Esc eeste Beset cancel ree Select one or more subject variables see below for further options v The combination of values of the specified variables should uniquely define subjects within the dataset For example a single Patient ID variable should be sufficient to define subjects in a single hospi
41. effect then the variables must begin with the split file variable or variables in the order specified when creating the Split File followed by RowType VarName_ P1 P2 as above Splits must occur in the specified dataset in the same order as in the original dataset Note The variable names P P2 are not required the procedure will accept any valid variable names for the parameters because the mapping of variables to parameters is based on variable position not variable name Any variables beyond the last parameter are ignored The file structure for the initial values is the same as that used when exporting the model as data thus you can use the final values from one run of the procedure as input in a subsequent run 84 Chapter 7 Generalized Estimating Equations Statistics Figure 7 10 Generalized Estimating Equations Statistics tab FFA Generalized Estimating Equations Repeated Type of Model Response Predtors Model Estimation Statistics EM Means Save Export Model Effects Analysis Type Confidence Interval Level 9 Chi square Statistics wald Generalized score Log quasi likelihood function Kernel gt Print EA Case processing summary E Contrast coefficient L matrices M Descriptive statistics T General estimable functions F Model information E iteration history M Goodness of fit statistics rint Int T Model summary statistics Parameter estimates
42. factor interactions It does not contain covariate interactions Select Custom to specify only a subset of interactions or to specify factor by covariate interactions You must indicate all of the terms to be included in the model Between Subjects The between subjects factors and covariates are listed Model The model depends on the nature of your data After selecting Custom you can select the within subjects effects and interactions and the between subjects effects and interactions that are of interest in your analysis 19 GLM Repeated Measures Sum of squares The method of calculating the sums of squares for the between subjects model For balanced or unbalanced between subjects models with no missing cells the Type III sum of squares method is the most commonly used Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Sum of Squares For the model you can choose a type of sums of squares Type III is the most commonly used and i
43. given to the first group the treatment categories are equally spaced and an appropriate metric for this situation consists of consecutive integers CONTRAST DRUG POLYNOMIAL 1 2 3 If however the dosage administered to the second group is four times that given to the first group and the dosage administered to the third group is seven times that given to the first group an appropriate metric is CONTRAST DRUG POLYNOMIAL 1 4 7 In either case the result of the contrast specification is that the first degree of freedom for drug contains the linear effect of the dosage levels and the second degree of freedom contains the quadratic effect Polynomial contrasts are especially useful in tests of trends and for investigating the nature of response surfaces You can also use polynomial contrasts to perform nonlinear curve fitting such as curvilinear regression Repeated Compares adjacent levels of an independent variable The general matrix form is mean 1k 1k Ik ka Vk Ik df 1 1 24 0 a 0 0 df 2 0 1 s 0 0 df k 1 0 0 0 ag Do E where k is the number of categories for the independent variable For example the repeated contrasts for an independent variable with four categories are as follows 1 4 1 4 1 4 1 4 Ci 0 0 0 1 Aj 0 0 0 LA These contrasts are useful in profile analysis and wherever difference scores are needed Special A user defined contrast Allows entry of special contrasts in the form
44. if any that contain it The Type III sums of squares have one major advantage in that they are invariant with respect to the cell frequencies as long as the general form of estimability remains constant Hence this type of sums of squares is often considered useful for an unbalanced 20 Chapter 3 model with no missing cells In a factorial design with no missing cells this method is equivalent to the Yates weighted squares of means technique The Type HI sum of squares method is commonly used for m Any models listed in Type I and Type II m Any balanced or unbalanced model with no empty cells Type IV This method is designed for a situation in which there are missing cells For any effect F in the design if F is not contained in any other effect then Type IV Type III Type II When F is contained in other effects Type IV distributes the contrasts being made among the parameters in F to all higher level effects equitably The Type IV sum of squares method is commonly used for m Any models listed in Type I and Type II m Any balanced model or unbalanced model with empty cells GLM Repeated Measures Contrasts Figure 3 4 Repeated Measures Contrasts dialog box E Repeated Measures Contrasts Factors time Repeated gender None Change Contrast Contrasts are used to test for differences among the levels of a between subjects factor You can specify a contrast for each between subjects f
45. interaction of the two factors is not significant Methods Type I Type II Type III and Type IV sums of squares can be used to evaluate different hypotheses Type III is the default Copyright IBM Corporation 1989 2012 2 3 GLM Multivariate Analysis Statistics Post hoc range tests and multiple comparisons least significant difference Bonferroni Sidak Scheffe Ryan Einot Gabriel Welsch multiple F Ryan Einot Gabriel Welsch multiple range Student Newman Keuls Tukey s honestly significant difference Tukey s b Duncan Hochberg s GT2 Gabriel Waller Duncan test Dunnett one sided and two sided Tamhane s T2 Dunnett s T3 Games Howell and Dunnett s C Descriptive statistics observed means standard deviations and counts for all of the dependent variables in all cells the Levene test for homogeneity of variance Box s M test of the homogeneity of the covariance matrices of the dependent variables and Bartlett s test of sphericity Plots Spread versus level residual and profile interaction Data The dependent variables should be quantitative Factors are categorical and can have numeric values or string values Covariates are quantitative variables that are related to the dependent variable Assumptions For dependent variables the data are a random sample of vectors from a multivariate normal population in the population the variance covariance matrices for all cells are the same Analysis of var
46. is a control group You can choose the first or last category as the reference Difference Compares the mean of each level except the first to the mean of previous levels Sometimes called reverse Helmert contrasts Helmert Compares the mean of each level of the factor except the last to the mean of subsequent levels Repeated Compares the mean of each level except the last to the mean of the subsequent level Polynomial Compares the linear effect quadratic effect cubic effect and so on The first degree of freedom contains the linear effect across all categories the second degree of freedom the quadratic effect and so on These contrasts are often used to estimate polynomial trends GLM Repeated Measures Profile Plots Figure 3 5 Repeated Measures Profile Plots dialog box EH Repeated Measures Profile Plots Factors Horizontal Axis gender time Separate Lines Separate Plots gt Plots ni ia Profile plots interaction plots are useful for comparing marginal means in your model A profile plot is a line plot in which each point indicates the estimated marginal mean of a dependent variable adjusted for any covariates at one level of a factor The levels of a second factor can be used to make separate lines Each level in a third factor can be used to create a separate plot All factors are available for plots Profile plots are created for each dependent variable Both between su
47. much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients A large Cook s D indicates that excluding a case from computation of the regression statistics changes the coefficients substantially m Leverage values Uncentered leverage values The relative influence of each observation on the model s fit Residuals An unstandardized residual is the actual value of the dependent variable minus the value predicted by the model Standardized Studentized and deleted residuals are also available m Unstandardized The difference between an observed value and the value predicted by the model 25 GLM Repeated Measures m Standardized The residual divided by an estimate of its standard deviation Standardized residuals which are also known as Pearson residuals have a mean of 0 and a standard deviation of 1 m Studentized The residual divided by an estimate of its standard deviation that varies from case to case depending on the distance of each case s values on the independent variables from the means of the independent variables m Deleted The residual for a case when that case is excluded from the calculation of the regression coefficients It is the difference between the value of the dependent variable and the adjusted predicted value Coefficient Statistics Saves a variance covariance matrix of the parameter estimates to a dataset or a data file Also for
48. multiple comparison tests Once you have determined that differences exist among the means post hoc range tests and pairwise multiple comparisons can determine which means differ Comparisons are made on unadjusted values These tests are not available if there are no between subjects factors and the post hoc multiple comparison tests are performed for the average across the levels of the within subjects factors The Bonferroni and Tukey s honestly significant difference tests are commonly used multiple comparison tests The Bonferroni test based on Student s statistic adjusts the observed significance level for the fact that multiple comparisons are made Sidak s t test also adjusts the significance level and provides tighter bounds than the Bonferroni test Tukey s honestly significant difference test uses the Studentized range statistic to make all pairwise comparisons 23 GLM Repeated Measures between groups and sets the experimentwise error rate to the error rate for the collection for all pairwise comparisons When testing a large number of pairs of means Tukey s honestly significant difference test is more powerful than the Bonferroni test For a small number of pairs Bonferroni is more powerful Hochberg s GT 2 is similar to Tukey s honestly significant difference test but the Studentized maximum modulus is used Usually Tukey s test is more powerful Gabriel s pairwise comparisons test also uses the Studentized ma
49. predicted values for each case in the metric of the linear predictor transformed response via the specified link function When the response distribution 1s multinomial the procedure saves the predicted value for each category of the response except the last up to the number of specified categories to save Estimated standard error of predicted value of linear predictor When the response distribution is multinomial the procedure saves the estimated standard error for each category of the response except the last up to the number of specified categories to save The following items are not available when the response distribution is multinomial m Raw residual The difference between an observed value and the value predicted by the model m Pearson residual The square root of the contribution of a case to the Pearson chi square statistic with the sign of the raw residual 90 Chapter 7 Generalized Estimating Equations Export Figure 7 13 Generalized Estimating Equations Export tab En Generalized Estimating Equations E Repeated Type of Model Response Predictors Model Estimation Statistics EM Means Save Export Export model as data Destination Dataset Name Par ameter estimates and covariance matri Parameter estimates and correlation matrix E Export model as XML Browse rExport as XM 9 Parameter estimates and covariance matrix Parameter estimates
50. residual The Pearson residual multiplied by the square root of the inverse of the product of the scale parameter and I leverage for the case Deviance residual The square root of the contribution of a case to the Deviance statistic with the sign of the raw residual Standardized deviance residual The Deviance residual multiplied by the square root of the inverse of the product of the scale parameter and I leverage for the case Likelihood residual The square root of a weighted average based on the leverage of the case of the squares of the standardized Pearson and standardized Deviance residuals with the sign of the raw residual 65 Generalized Linear Models Generalized Linear Models Export Figure 6 12 Generalized Linear Models Export tab En Generalized Linear Models E Type of Model Response Predictors Model Estimation Statistics EMMeans Save Export Y Export model as data p Destination 9 Dataset O Data file Export as Data Parameter estimates and covariance matrix Parameter estimates and correlation matrix E Export model as XML rExport as XML Parameter estimates and covariance matrix Parameter estimates only Export model as data Writes a dataset in IBM SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in t
51. saves predicted probabilities When the response distribution is multinomial the item label becomes Cumulative predicted probability and the procedure saves the cumulative predicted probability for each category of the response except the last up to the number of specified categories to save m Lower bound of confidence interval for mean of response Saves the lower bound of the confidence interval for the mean of the response When the response distribution is multinomial the item label becomes Lower bound of confidence interval for cumulative predicted probability and the procedure saves the lower bound for each category of the response except the last up to the number of specified categories to save 89 Generalized Estimating Equations Upper bound of confidence interval for mean of response Saves the upper bound of the confidence interval for the mean of the response When the response distribution is multinomial the item label becomes Upper bound of confidence interval for cumulative predicted probability and the procedure saves the upper bound for each category of the response except the last up to the number of specified categories to save Predicted category For models with binomial distribution and binary dependent variable or multinomial distribution this saves the predicted response category for each case This option is not available for other response distributions Predicted value of linear predictor Saves model
52. the first n categories up to the value specified as the Maximum categories to save The default root name is CI and the default field names are CI Lower 1 CI_Upper_1 CI Lower 2 CI Upper 2 and so on corresponding to the order of the target categories For the multinomial distribution and an ordinal target one field is created for each dependent variable category except the last For more information see the topic Build Options on p 107 This saves the lower and upper bounds of the cumulative predicted probability for the first n categories up to but not including the last and up to the value specified as the Maximum categories to save The default root name is CY and the default field names are CI Lower 1 CI Upper 1 CI Lower 2 CI Upper 2 and so on corresponding to the order of the target categories Pearson residuals Saves the Pearson residual for each record which can be used in post estimation diagnostics of the model fit The default field name is PearsonResidual Confidences Saves the confidence in the predicted value for the categorical target The computed confidence can be based on the probability of the predicted value the highest predicted probability or the difference between the highest predicted probability and the second highest predicted probability The default field name is Confidence Export model This writes the model to an external zip file You can use this model file to apply the model informatio
53. to Specify initial values for parameter estimates as a list of numbers using the CRITERIA subcommand Fix covariates at values other than their means when computing estimated marginal means using the EMMEANS subcommand Specify custom polynomial contrasts for estimated marginal means using the EMMEANS subcommand Specify a subset of the factors for which estimated marginal means are displayed to be compared using the specified contrast type using the TABLES and COMPARE keywords of the EMMEANS subcommand 67 Generalized Linear Models See the Command Syntax Reference for complete syntax information Chapter 7 Generalized Estimating Equations The Generalized Estimating Equations procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations such as clustered data Example Public health officials can use generalized estimating equations to fit a repeated measures logistic regression to study effects of air pollution on children Data The response can be scale counts binary or events in trials Factors are assumed to be categorical The covariates scale weight and offset are assumed to be scale Variables used to define subjects or within subject repeated measurements cannot be used to define the response but can serve other roles in the model Assumptions Cases are assumed to be dependent within subjects and independent between subjects The
54. to the cumulative probability of each category of the response where 0 is the inverse standard normal cumulative distribution function This is appropriate only with the multinomial distribution Log f x log x This link can be used with any distribution Log complement x log 1 x This is appropriate only with the binomial distribution Logit f x log x 1 x This is appropriate only with the binomial distribution Negative binomial x log x x k71 where k is the ancillary parameter of the negative binomial distribution This is appropriate only with the negative binomial distribution Negative log log x log log x This is appropriate only with the binomial distribution Odds power f x x 1 x 1 a if a 4 0 f x log x if a 0 a is the required number specification and must be a real number This is appropriate only with the binomial distribution 75 Generalized Estimating Equations m Probit x x where is the inverse standard normal cumulative distribution function This is appropriate only with the binomial distribution m Power f x x if a 0 f x log x if a 0 a is the required number specification and must be a real number This link can be used with any distribution Generalized Estimating Equations Response Figure 7 3 Generalized Estimating Equations Response tab fa Generalized Estimating Equations Variables Dependent Variable Dependent Vari
55. 00 c eee eee eee ee 72 Generalized Estimating Equations Response cece cece eee eee 75 Generalized Estimating Equations Reference Category eee eeaee 76 vi Generalized Estimating Equations Predictors 0c cece eee eee eee 77 Generalized Estimating Equations Options sanaaa aaa 18 Generalized Estimating Equations Model 0 0000 cece eee 79 Generalized Estimating Equations Estimation 0 000 c eee eee eee 81 Generalized Estimating Equations Initial Values 00 00 0000 e eee eee ee 82 Generalized Estimating Equations Statistics 0c cee eee eee ee 84 Generalized Estimating Equations EM Means 0000 cece eee ee 86 Generalized Estimating Equations Save 0 000 cee 88 Generalized Estimating Equations Export 0000 c cece tenes 90 GENLIN Command Additional Features 0 0 0 0 ccc cee een eens 91 8 Generalized linear mixed models 93 Obtaining a generalized linear mixed model 0 0 0c cece eee eee eee 95 Target tddi Mor Ded eek ewe panko oe Cae panded ede ted tae Kan bA 97 Fixed Effects iii at oa AA amet Saletan doa NING eae Me eae aw eae ws 100 Adda Custom TERM ccoo ARA DUNG bd Aart eed ee eee 101 Random Effects 00d dra a BRA GG a DR KA DA sh Svar DAGA haha 103 Random Effect Blok aii tute BIG bled dee Pe NG DAG NAA OP eda kA 104 Weight and Offset Xa ts tas BRED te dia rd da ted KAL nG 106 Build Otros cias NAA AA AKING ek
56. 1 to 10 dependent and factor variables combined A cell structure variable allows you to define structural zeros for incomplete tables include an offset term in the model fit a log rate model or implement the method of adjustment of marginal tables Contrast variables allow computation of generalized log odds ratios GLOR The values of the contrast variable are the coefficients for the linear combination of the logs of the expected cell counts Model information and goodness of fit statistics are automatically displayed You can also display a variety of statistics and plots or save residuals and predicted values in the active dataset Example A study in Florida included 219 alligators How does the alligators food type vary with their size and the four lakes in which they live The study found that the odds of a smaller alligator preferring reptiles to fish is 0 70 times lower than for larger alligators also the odds of selecting primarily reptiles instead of fish were highest in lake 3 Statistics Observed and expected frequencies raw adjusted and deviance residuals design matrix parameter estimates generalized log odds ratio Wald statistic and confidence intervals Plots adjusted residuals deviance residuals and normal probability plots Data The dependent variables are categorical Factors are categorical Cell covariates can be continuous but when a covariate is in the model the mean covariate value for cases in a cell is
57. 2 as above Splits must occur in the specified dataset in the same order as in the original dataset Note The variable names P P2 are not required the procedure will accept any valid variable names for the parameters because the mapping of variables to parameters is based on variable position not variable name Any variables beyond the last parameter are ignored The file structure for the initial values is the same as that used when exporting the model as data thus you can use the final values from one run of the procedure as input in a subsequent run 59 Generalized Linear Models Generalized Linear Models Statistics Figure 6 9 Generalized Linear Models Statistics tab E Generalized Linear Models aaa aka o SLES Se ETT Type of Model Response Predictors Model Estimation Statistics EMMeans Save Export Model Effects Analysis Type Confidence Interval Level Chi square Statistics Confidence Interval Type wala wala Likelihood ratio Profile likelihood Tolerance leve Beats Log Likelihood Function Print Z Case processing summary F Contrast coefficient L matrices Descriptive statistics E General estimable functions T Model information Iteration history N Goodness of fit statistics Print Interval T Model summary statistics Lagrange multiplier test Parameter estimates muse E Include exponential parameter estimates E Cov
58. 2 139 140 Chapter 12 gt vy v v y observations such that there would be only a small number of observations in each survival time interval If you have variables that you suspect are related to survival time or variables that you want to control for covariates use the Cox Regression procedure If your covariates can have different values at different points in time for the same case use Cox Regression with Time Dependent Covariates Creating Life Tables From the menus choose Analyze gt Survival gt Life Tables Figure 12 1 Life Tables dialog box EX Life Tables go Multiple lines muttline b an aaa Months with service tenure Voice mail voice 2 Paging service pag BU Time Intervals gt Internet internet O through so by amp Caller ID cali Call waiting callwait Status S Cat torwercng tor SP ma E aaa 3 way calling conf 5 Electronic billing ebill AAA 8 Log long distance I Factor 9 Logia ve ngo Bi Esaa PA Log equipment log 8 Log calling card lo PA Log wireless logwi By Factor E Log income Ininc Select one numeric survival variable Specify the time intervals to be examined Select a status variable to define cases for which the terminal event has occurred Click Define Event to specify the value of the status variable that indicates that an event occurred Optionally you can select a first order factor
59. 2 way Creates all possible two way interactions of the selected variables 5 GLM Multivariate Analysis All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Sum of Squares For the model you can choose a type of sums of squares Type III is the most commonly used and is the default Type I This method is also known as the hierarchical decomposition of the sum of squares method Each term is adjusted for only the term that precedes it in the model Type I sums of squares are commonly used for m A balanced ANOVA model in which any main effects are specified before any first order interaction effects any first order interaction effects are specified before any second order interaction effects and so on A polynomial regression model in which any lower order terms are specified before any higher order terms m A purely nested model in which the first specified effect is nested within the second specified effect the second specified effect is nested within the third and so on This form of nesting can be specified only by using syntax Type Il This method calculates the sums of squares of an effect in the model adjusted for all other appropriate effects An appropriate effect is one that corresponds to all effects that do not cont
60. 20 1 00 This view displays the size of each fixed effect in the model Styles There are different display styles which are accessible from the Style dropdown list Diagram This is a chart in which effects are sorted from top to bottom in the order in which they were specified on the Fixed Effects settings Connecting lines in the diagram are weighted based on effect significance with greater line width corresponding to more significant effects smaller p values This is the default Table This is an ANOVA table for the overall model and the individual model effects The individual effects are sorted from top to bottom in the order in which they were specified on the Fixed Effects settings Significance There is a Significance slider that controls which effects are shown in the view Effects with significance values greater than the slider value are hidden This does not change the model but simply allows you to focus on the most important effects By default the value is 1 00 so that no effects are filtered based on significance 118 Chapter 8 Fixed Coefficients Figure 8 18 Fixed Coefficients view diagram style Fixed Coefficients Target Post test Coefficient Intercept Estimate Positive Negative school_setting 1 amp school setting 2 NN KANI posttest n student PA gender 0 amp Y teaching_method 0 G5 lunch 1 pretest 8 KH gt Display coefficients with sig values up to
61. 4 05 04 C3 04 2 2 2 A 2 A 04 Scaled Identity This structure has constant variance There is assumed to be no correlation between any elements 1 0 0 O o2 9 1 0 0 0 0 1 0 0 0 0 1 Toeplitz This covariance structure has homogenous variances and heterogenous correlations between elements The correlation between adjacent elements is homogenous across pairs of adjacent elements The correlation between elements separated by a third is again homogenous and so on 1 Pi Pa P3 o 1 M pa p pr 1 Pi p3 Pa p 1 Toeplitz Heterogenous This covariance structure has heterogenous variances and heterogenous correlations between elements The correlation between adjacent elements is homogenous across pairs of adjacent elements The correlation between elements separated by a third is again homogenous and so on 2 01 0201P1 030102 7401 3 0201P1 o2 0302P1 0402P2 0301P2 0302 1 02 0403 P1 0401Pp3 0402Pp2 0403p o Unstructured This is a completely general covariance matrix 165 Covariance Structures 2 01 021 031 041 2 021 05 032 42 2 031 032 03 043 2 041 042 043 OG Unstructured Correlation Metric This covariance structure has heterogenous variances and heterogenous correlations o 0201P21 9301931 9401 41 0201P21 02 0302P32 0402P42 0301P31 0302932 02 0403P43 0401P41 0402042 0403p43 o Variance Components This structure assigns a scaled identity ID structure to each of the specified random effects A
62. Covariates Categorical Covariates E Years with current em PA Years at current addre E Number of people in ho E Age in years age Change Contrast Reference Category O Last O First 150 Chapter 14 You can specify details of how the Cox Regression procedure will handle categorical variables Covariates Lists all of the covariates specified in the main dialog box either by themselves or as part of an interaction in any layer If some of these are string variables or are categorical you can use them only as categorical covariates Categorical Covariates Lists variables identified as categorical Each variable includes a notation in parentheses indicating the contrast coding to be used String variables denoted by the symbol lt following their names are already present in the Categorical Covariates list Select any other categorical covariates from the Covariates list and move them into the Categorical Covariates list Change Contrast Allows you to change the contrast method Available contrast methods are m Indicator Contrasts indicate the presence or absence of category membership The reference category is represented in the contrast matrix as a row of zeros m Simple Each category of the predictor variable except the reference category is compared to the reference category m Difference Each category of the predictor variable except the first category is compared to the average effect of previou
63. M Multivariate and GLM Repeated Measures Copyright IBM Corporation 1989 2012 28 29 Variance Components Analysis Obtaining Variance Components Tables gt From the menus choose Analyze gt General Linear Model gt Variance Components Figure 4 1 Variance Components dialog box E Variance Components E Dependent Variable Model amp Heatth food store hitht Le S Amount spent amtspent Caro dll Size of store size Fixed Factor s amp Store organization org n amp ane EG Customer ID custid Paa Na Gender gender vegetarian veg Random Factor s Shopping style style e 5 Store ID 5toreid Predicted Value for A Cluster Number of Cas qel_1 3 FILTER fite cy Covariate s FI WLS Weight gt Select a dependent variable Select variables for Fixed Factor s Random Factor s and Covariate s as appropriate for your data For specifying a weight variable use WLS Weight 30 Chapter 4 Variance Components Model Figure 4 2 Variance Components Model dialog box FFA Variance Components Model Specify Model Full factorial Custom Factors amp Covariates pul shopfor L usecoup Lt storeid shoptortusecoup Build Term s Type IM Include intercept in model i et Specify Model A full factorial model contains all factor main effects all covariate main effe
64. Models Save Figure 6 11 Generalized Linear Models Save tab E Generalized Linear Models Generalized Linear Models Save litem to Save variable Name or Root Name Categories to Save Existing Variable with Same Name Predicted value of mean of response Lower bound of confidence interval for mean of response Upper bound of confidence interval for mean of response Predicted category Predicted value of linear predictor Estimated standard error of predicted value of linear predictor Cook s distance Leverage value Residual Pearson residual Standardized Pearson residual Deviance residual Standardized deviance residual Likelihood residual MeanPredicted Add suffix to name of new variable applies only to default names Replace existing variable applies to both default and user provided names p If you provide your own variable names make sure that they do not conflict with existing variables in the active dataset Select the Replace Existing Variable option if you want to overwrite existing variables with the same user provided name Cow eeste eset cence Heb Checked items are saved with the specified name you can choose to overwrite existing variables with the same name as the new variables or avoid name conflicts by appendix suffixes to make the new variable names unique m Predicted value of mean of response Saves model predicted values for each case in t
65. None Overall Pairwise Ei Ge 142 Chapter 12 You can control various aspects of your Life Tables analysis Life table s To suppress the display of life tables in the output deselect Life table s Plot Allows you to request plots of the survival functions If you have defined factor variable s plots are generated for each subgroup defined by the factor variable s Available plots are survival log survival hazard density and one minus survival Survival Displays the cumulative survival function on a linear scale Log survival Displays the cumulative survival function on a logarithmic scale Hazard Displays the cumulative hazard function on a linear scale Density Displays the density function One minus survival Plots one minus the survival function on a linear scale Compare Levels of First Factor If you have a first order control variable you can select one of the alternatives in this group to perform the Wilcoxon Gehan test which compares the survival of subgroups Tests are performed on the first order factor If you have defined a second order factor tests are performed for each level of the second order variable SURVIVAL Command Additional Features The command syntax language also allows you to Specify more than one dependent variable Specify unequally spaced intervals Specify more than one status variable Specify comparisons that do not include all the factor and all t
66. OUTFILE subcommand See the Command Syntax Reference for complete syntax information Chapter 3 GLM Repeated Measures The GLM Repeated Measures procedure provides analysis of variance when the same measurement is made several times on each subject or case If between subjects factors are specified they divide the population into groups Using this general linear model procedure you can test null hypotheses about the effects of both the between subjects factors and the within subjects factors You can investigate interactions between factors as well as the effects of individual factors In addition the effects of constant covariates and covariate interactions with the between subjects factors can be included In a doubly multivariate repeated measures design the dependent variables represent measurements of more than one variable for the different levels of the within subjects factors For example you could have measured both pulse and respiration at three different times on each subject The GLM Repeated Measures procedure provides both univariate and multivariate analyses for the repeated measures data Both balanced and unbalanced models can be tested A design is balanced if each cell in the model contains the same number of cases In a multivariate model the sums of squares due to the effects in the model and error sums of squares are in matrix form rather than the scalar form found in univariate analysis These matrices are calle
67. Order Heterogeneous Huynh Feldt Scaled Identity Toeplitz Toeplitz Heterogeneous Unstructured Unstructured Correlation Metric Variance Components For more information see the topic Covariance Structures in Appendix B on p 162 Random Effects There is no default model so you must explicitly specify the random effects Alternatively you can build nested or non nested terms You can also choose to include an intercept term in the random effects model You can specify multiple random effects models After building the first model click Next to build the next model Click Previous to scroll back through existing models Each random effect model is assumed to be independent of every other random effect model that is separate covariance matrices will be computed for each Terms specified in the same random effect model can be correlated Subject Groupings The variables listed are those that you selected as subject variables in the Select Subjects Repeated Variables dialog box Choose some or all of these in order to define the subjects for the random effects model 41 Linear Mixed Models Linear Mixed Models Estimation Figure 5 5 Linear Mixed Models Estimation dialog box E Linear Mixed Models Estimation pMethod Restricted Maximum Likelihood REML Maximum Likelihood ML piterations Maximum iterations 100 Maximum step halvings Bo E Print iteration history for every step s
68. Regression 149 Student Newman Keuls in GLM Multivariate 8 in GLM Repeated Measures 22 subjects variables in Linear Mixed Models 36 sum of squares 5 19 sums of squares hypothesis and error matrices 11 in Linear Mixed Models 38 in Variance Components 32 survival analysis in Cox Regression 148 in Kaplan Meier 143 in Life Tables 139 Time Dependent Cox Regression 155 survival function in Life Tables 139 t test in GLM Multivariate 11 in GLM Repeated Measures 25 Tamhane s T2 in GLM Multivariate 8 in GLM Repeated Measures 22 Tarone Ware test in Kaplan Meier 145 trademarks 167 Tukey s b test in GLM Multivariate 8 in GLM Repeated Measures 22 Tukey s honestly significant difference in GLM Multivariate 8 in GLM Repeated Measures 22 Tweedie distribution in generalized estimating equations 73 in generalized linear models 48 unstandardized residuals in GLM 10 in GLM Repeated Measures 24 Variance Components 28 command additional features 33 model 30 options 31 saving results 33 Wald statistic in General Loglinear Analysis 128 in Logit Loglinear Analysis 133 Waller Duncan test in GLM Multivariate 8 in GLM Repeated Measures 22 weighted predicted values in GLM 10 in GLM Repeated Measures 24 Wilcoxon test in Life Tables 141
69. a consistent number of subpopulations Offset The offset term is a structural predictor Its coefficient is not estimated by the model but is assumed to have the value 1 thus the values of the offset are simply added to the linear predictor of the target This is especially useful in Poisson regression models where each case may have different levels of exposure to the event of interest 78 Chapter 7 For example when modeling accident rates for individual drivers there is an important difference between a driver who has been at fault in one accident in three years of experience and a driver who has been at fault in one accident in 25 years The number of accidents can be modeled as a Poisson or negative binomial response with a log link if the natural log of the experience of the driver is included as an offset term Other combinations of distribution and link types would require other transformations of the offset variable Generalized Estimating Equations Options Figure 7 6 Generalized Estimating Equations Options dialog box FFA Generalized Estimating Equations Options rUser Missing Values Specify how to treat cases with user missing values on factors subject variables or within subject variables 9 Exclude Include Cases with user missing values on the dependent variable covariates scale weight variable or offset variable are always excluded gt Category Order for Factors Ascending
70. a priori theoretical considerations or which combination seems to fit best m Binomial This distribution is appropriate only for variables that represent a binary response or number of events m Gamma This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis m Inverse Gaussian This distribution is appropriate for variables with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis m Negative binomial This distribution can be thought of as the number of trials required to observe k successes and is appropriate for variables with non negative integer values If a data value is non integer less than 0 or missing then the corresponding case is not used in the analysis The value of the negative binomial distribution s ancillary parameter can be any number greater than or equal to 0 you can set it to a fixed value or allow it to be estimated by 49 Generalized Linear Models the procedure When the ancillary parameter is set to 0 using this distribution is equivalent to using the Poisson distribution Normal This is appropriate for scale variables whose values take a symmetric bell shaped distribution about a central mean value
71. able gt PA Number of damage incidents damage incidents Category order multinomial only E Aggregate months of service months_ser Scale Weight Scale Weight Variable KI In many cases you can simply specify a dependent variable however variables that take only two values and responses that record events in trials require extra attention m Binary response When the dependent variable takes only two values you can specify the reference category for parameter estimation A binary response variable can be string or numeric m Number of events occurring in a set of trials When the response is a number of events occurring in a set of trials the dependent variable contains the number of events and you can select an additional variable containing the number of trials Alternatively if the number of trials is the same across all subjects then trials may be specified using a fixed value The number of 76 Chapter 7 trials should be greater than or equal to the number of events for each case Events should be non negative integers and trials should be positive integers For ordinal multinomial models you can specify the category order of the response ascending descending or data data order means that the first value encountered in the data defines the first category the last value encountered defines the last category Scale Weight The scale parameter is an estimated model paramet
72. actor in the model Contrasts represent linear combinations of the parameters Hypothesis testing is based on the null hypothesis LBM 0 where L is the contrast coefficients matrix B is the parameter vector and M is the average matrix that corresponds to the average transformation for the dependent variable You can display this transformation matrix by selecting Transformation matrix in the Repeated Measures Options dialog box For example if there are four dependent variables a within subjects factor of four levels and polynomial contrasts the default are used for within subjects factors the M matrix will be 0 5 0 5 0 5 0 5 When a contrast is specified an L matrix is created such that the columns corresponding to the between subjects factor match the contrast The remaining columns are adjusted so that the L matrix is estimable Available contrasts are deviation simple difference Helmert repeated and polynomial For deviation contrasts and simple contrasts you can choose whether the reference category is the last or first category A contrast other than None must be selected for within subjects factors 21 GLM Repeated Measures Contrast Types Deviation Compares the mean of each level except a reference category to the mean of all of the levels grand mean The levels of the factor can be in any order Simple Compares the mean of each level to the mean of a specified level This type of contrast is useful when there
73. ain the effect being examined The Type II sum of squares method is commonly used for m A balanced ANOVA model m Any model that has main factor effects only m Any regression model m A purely nested design This form of nesting can be specified by using syntax Type Ill The default This method calculates the sums of squares of an effect in the design as the sums of squares adjusted for any other effects that do not contain it and orthogonal to any effects if any that contain it The Type III sums of squares have one major advantage in that they are invariant with respect to the cell frequencies as long as the general form of estimability remains constant Hence this type of sums of squares is often considered useful for an unbalanced model with no missing cells In a factorial design with no missing cells this method is equivalent to the Yates weighted squares of means technique The Type II sum of squares method is commonly used for m Any models listed in Type I and Type II m Any balanced or unbalanced model with no empty cells 6 Chapter 2 Type IV This method is designed for a situation in which there are missing cells For any effect F in the design if F is not contained in any other effect then Type IV Type III Type II When F is contained in other effects Type IV distributes the contrasts being made among the parameters in F to all higher level effects equitably The Type IV sum of squares method is commonly
74. als E Estimates 7 Iteration history Normal probability for deviance a Confidence Interval rCriteria Maximum iterations 20 Convergence Delta JE conte conca _ ret The Logit Loglinear Analysis procedure displays model information and goodness of fit statistics In addition you can choose one or more of the following options Display Several statistics are available for display observed and expected cell frequencies raw adjusted and deviance residuals a design matrix of the model and parameter estimates for the model Plot Plots available for custom models include two scatterplot matrices adjusted residuals or deviance residuals against observed and expected cell counts You can also display normal probability and detrended normal plots of adjusted residuals or deviance residuals Confidence Interval The confidence interval for parameter estimates can be adjusted 137 Logit Loglinear Analysis Criteria The Newton Raphson method is used to obtain maximum likelihood parameter estimates You can enter new values for the maximum number of iterations the convergence criterion and delta a constant added to all cells for initial approximations Delta remains in the cells for saturated models Logit Loglinear Analysis Save Figure 11 4 Logit Loglinear Analysis Save dialog box E Logit Loglinear Analysis Save Residuals E Standardized r
75. alysis The fixed value of the Tweedie distribution s parameter can be any number greater than one and less than two m Multinomial This distribution is appropriate for variables that represent an ordinal response The dependent variable can be numeric or string and it must have at least two distinct valid data values Link Function The link function is a transformation of the dependent variable that allows estimation of the model The following functions are available m Identity x x The dependent variable is not transformed This link can be used with any distribution Complementary log log x log log 1 x This is appropriate only with the binomial distribution Cumulative Cauchit fx tan x x 0 5 applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative complementary log log f x In In 1 x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative logit x In x 1 x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution Cumulative negative log log x In In x applied to the cumulative probability of each category of the response This is appropriate only with the multinomial distribution m Cumulative probit x 1 x applied
76. and Additional Features 0 000 c eects 153 15 Computing Time Dependent Covariates 155 Computing a Time Dependent Covariate 0 00 0 c eects 155 Cox Regression with Time Dependent Covariates Additional Features 156 Appendices A Categorical Variable Coding Schemes 157 DVT NY cd NAN deen NAAN NAA maa 157 SIMPIC cc cet Ad Ana KG LALA Ma LA eae e ade Gaede ae LEG 158 AA 158 Difference o 159 Polynomial s 64 ccna LIST ge RA WG Bed Hanh KNA hae Sede thanks Pa eed eae 159 Repeated ss sdo ami maia cos sound ta da beer aii a Hoe ANTE Cah eae RN Pea eee woe PAG 160 SPOClal icc mane Pa coded tee aa ee ecard lp 160 VGC ATOM se brian a ANAN PNG rag Kh ahe ara araoa an ea gat head tania dk A eed te 161 B Covariance Structures 162 C Notices 166 Index 169 Chapter Introduction to Advanced Statistics The Advanced Statistics option provides procedures that offer more advanced modeling options than are available through the Statistics Base option GLM Multivariate extends the general linear model provided by GLM Univariate to allow multiple dependent variables A further extension GLM Repeated Measures allows repeated measurements of multiple dependent variables Variance Components Analysis is a specific tool for decomposing the variability in a dependent variable into fixed and random components Linear Mixed Models expands the general linear model so that the data are permitted to exhibit corr
77. any that contain it The Type III sums of squares have one major advantage in that they are invariant with respect to the cell frequencies as long as the general form of estimability remains constant Hence this type of sums of squares is often considered useful for an unbalanced model with no missing cells In a factorial design with no missing cells this method is equivalent 39 Linear Mixed Models to the Yates weighted squares of means technique The Type IN sum of squares method is commonly used for m Any models listed in Type I m Any balanced or unbalanced models with no empty cells Linear Mixed Models Random Effects Figure 5 4 Linear Mixed Models Random Effects dialog box FR Linear Mixed Models Random Effects Random Effect 1 of 1 Covariance Type Random Effects Build terms Factors and Covariates Subject Groupings Subjects Combinations Customer ID custid amp Customer ID custid 7 Include intercept Covariance type This allows you to specify the covariance structure for the random effects model A separate covariance matrix is estimated for each random effect The available structures are as follows AR 1 AR 1 Heterogeneous ARMA 1 1 Compound Symmetry Ante Dependence First Order Compound Symmetry Heterogeneous Compound Symmetry Correlation Metric 40 Chapter 5 Diagonal Factor Analytic First Order Factor Analytic First
78. applied to that cell Contrast variables are continuous They are used to compute generalized log odds ratios GLOR The values of the contrast variable are the coefficients for the linear combination of the logs of the expected cell counts A cell structure variable assigns weights For example if some of the cells are structural zeros the cell structure variable has a value of either 0 or 1 Do not use a cell structure variable to weight aggregate data Instead use Weight Cases on the Data menu Assumptions The counts within each combination of categories of explanatory variables are assumed to have a multinomial distribution Under the multinomial distribution assumption m The total sample size is fixed or the analysis is conditional on the total sample size m The cell counts are not statistically independent Related procedures Use the Crosstabs procedure to display the contingency tables Use the General Loglinear Analysis procedure when you want to analyze the relationship between an observed count and a set of explanatory variables Copyright IBM Corporation 1989 2012 133 134 Chapter 11 Obtaining a Logit Loglinear Analysis gt From the menus choose Analyze gt Loglinear gt Logit Figure 11 1 Logit Loglinear Analysis dialog box E Logit Loglinear Analysis E Dependent amp Marital status marital pa Preferred breakfast bf Factor s Cell Structure Contrast Variable s
79. are known as censored cases and they make this kind of study inappropriate for traditional techniques such as f tests or linear regression A statistical technique useful for this type of data is called a follow up life table The basic idea of the life table is to subdivide the period of observation into smaller time intervals For each interval all people who have been observed at least that long are used to calculate the probability of a terminal event occurring in that interval The probabilities estimated from each of the intervals are then used to estimate the overall probability of the event occurring at different time points Example Is a new nicotine patch therapy better than traditional patch therapy in helping people to quit smoking You could conduct a study using two groups of smokers one of which received the traditional therapy and the other of which received the experimental therapy Constructing life tables from the data would allow you to compare overall abstinence rates between the two groups to determine if the experimental treatment is an improvement over the traditional therapy You can also plot the survival or hazard functions and compare them visually for more detailed information Statistics Number entering number leaving number exposed to risk number of terminal events proportion terminating proportion surviving cumulative proportion surviving and standard error probability density and standard error and hazard
80. ariance matrix for parameter estimates E Correlation matrix for parameter estimates C eeste geset conce Heb Model Effects Analysis type Specify the type of analysis to produce Type I analysis is generally appropriate when you have a priori reasons for ordering predictors in the model while Type III is more generally applicable Wald or likelihood ratio statistics are computed based upon the selection in the Chi Square Statistics group Confidence intervals Specify a confidence level greater than 50 and less than 100 Wald intervals are based on the assumption that parameters have an asymptotic normal distribution profile likelihood intervals are more accurate but can be computationally expensive The tolerance level for profile likelihood intervals is the criteria used to stop the iterative algorithm used to compute the intervals Log likelihood function This controls the display format of the log likelihood function The full function includes an additional term that is constant with respect to the parameter estimates it has no effect on parameter estimation and is left out of the display in some software products 60 Chapter 6 Print The following output is available Case processing summary Displays the number and percentage of cases included and excluded from the analysis and the Correlated Data Summary table Descriptive statistics Displays descriptive statistics and summary information about the depend
81. ases or maximum step halving is reached Specify a positive integer m Check for separation of data points When selected the algorithm performs tests to ensure that the parameter estimates have unique values Separation occurs when the procedure can produce a model that correctly classifies every case This option is available for multinomial responses and binomial responses with binary format Convergence Criteria Parameter convergence When selected the algorithm stops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified which must be positive Log likelihood convergence When selected the algorithm stops after an iteration in which the absolute or relative change in the log likelihood function is less than the value specified which must be positive Hessian convergence For the Absolute specification convergence is assumed if a statistic based on the Hessian convergence is less than the positive value specified For the Relative specification convergence is assumed if the statistic is less than the product of the positive value specified and the absolute value of the log likelihood Singularity tolerance Singular or non invertible matrices have linearly dependent columns which can cause serious problems for the estimation algorithm Even near singular matrices can lead to poor results so the procedure will treat a matrix whose determinant is less than the tolera
82. ast evaluation of the gradient vector and the Hessian matrix The iteration history table displays parameter estimates for every nth iterations beginning with the 0th iteration the initial estimates where n is the value of the print interval If the iteration history is requested then the last iteration is always displayed regardless of n Lagrange multiplier test Displays Lagrange multiplier test statistics for assessing the validity of a scale parameter that is computed using the deviance or Pearson chi square or set at a fixed number for the normal gamma inverse Gaussian and Tweedie distributions For the negative binomial distribution this tests the fixed ancillary parameter 61 Generalized Linear Models Generalized Linear Models EM Means Figure 6 10 Generalized Linear Models EM Means tab EH Generalized Linear Models Type ot Model Response Predictors Model Estimation Statistics EM Means Save Export Factors and Interactions Display Means for M Term ES duration ES treatment O id Scale Compute means for response Compute means for linear predictor Adjustment for Multiple Comparisons E Display overall estimated mean This tab allows you to display the estimated marginal means for levels of factors and factor interactions You can also request that the overall estimated mean be displayed Estimated marginal means are not available for ordinal multinomial models Factors and Inte
83. asts are often used to estimate polynomial trends Scale Estimated marginal means can be computed for the response based on the original scale of the dependent variable or for the linear predictor based on the dependent variable as transformed by the link function Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level Sidak This method provides tighter bounds than the Bonferroni approach Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level 88 Chapter 7 Generalized Estimating Equations Save Figure 7 12 Generalized Estimating Equations Save tab li Generalized Estimating
84. ate Lines for Plots can help you to evaluate your estimated model and interpret the results You can plot the survival hazard log minus log and one minus survival functions m Survival Displays the cumulative survival function on a linear scale m Hazard Displays the cumulative hazard function on a linear scale m Log minus log The cumulative survival estimate after the In In transformation is applied to the estimate m One minus survival Plots one minus the survival function on a linear scale Because these functions depend on values of the covariates you must use constant values for the covariates to plot the functions versus time The default is to use the mean of each covariate as a constant value but you can enter your own values for the plot using the Change Value control group You can plot a separate line for each value of a categorical covariate by moving that covariate into the Separate Lines For text box This option is available only for categorical covariates which are denoted by Cat after their names in the Covariate Values Plotted At list 152 Chapter 14 Cox Regression Save New Variables Figure 14 4 Cox Regression Save New Variables dialog box EH Cox Regression Save pSave Model Variables T Survival function E Hazard function T X Beta E Standard error of survival function Partial residuals E Log minus log survival function E DfBeta s Export
85. ate s and WLS Weight GLM Multivariate Model Figure 2 2 Multivariate Model dialog box E Multivariate Model Specify Model Full factorial Custom Factors amp Covariates Model Sum of squares Type gt Fi Include intercept in model a axe Cree Specify Model A full factorial model contains all factor main effects all covariate main effects and all factor by factor interactions It does not contain covariate interactions Select Custom to specify only a subset of interactions or to specify factor by covariate interactions You must indicate all of the terms to be included in the model Factors and Covariates The factors and covariates are listed Model The model depends on the nature of your data After selecting Custom you can select the main effects and interactions that are of interest in your analysis Sum of squares The method of calculating the sums of squares For balanced or unbalanced models with no missing cells the Type III sum of squares method is most commonly used Include intercept in model The intercept is usually included in the model If you can assume that the data pass through the origin you can exclude the intercept Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All
86. ation Settings Confidence level og Fixed for all tests Residual method Useful if sample size is larger or data are balanced or uses a simpler covariance type for example scaled identity or diagonal Varied across tests Satterthwaite approximation Useful if sample size is smaller or data are unbalanced or have complicated covariance types for example unstructured Assume model assumptions are correct Use robust estimation to handle violations of model assumptions robust covariances These selections specify some more advanced criteria used to build the model Sorting Order These controls determine the order of the categories for the target and factors categorical inputs for purposes of determining the last category The target sort order setting is ignored if the target is not categorical or if a custom reference category is specified on the Target settings Stopping Rules You can specify the maximum number of iterations the algorithm will execute Specify a non negative integer The default is 100 Post Estimation Settings These settings determine how some of the model output is computed for viewing m Confidence level This is the level of confidence used to compute interval estimates of the model coefficients Specify a value greater than 0 and less than 100 The default is 95 m Degrees of freedom This specifies how degrees of freedom are computed for significance tests Choose Fixed for all t
87. atistics Observed and expected frequencies raw adjusted and deviance residuals design matrix parameter estimates odds ratio log odds ratio GLOR Wald statistic and confidence intervals Plots adjusted residuals deviance residuals and normal probability Data Factors are categorical and cell covariates are continuous When a covariate is in the model the mean covariate value for cases in a cell is applied to that cell Contrast variables are continuous They are used to compute generalized log odds ratios The values of the contrast variable are the coefficients for the linear combination of the logs of the expected cell counts A cell structure variable assigns weights For example if some of the cells are structural zeros the cell structure variable has a value of either 0 or 1 Do not use a cell structure variable to weight aggregated data Instead choose Weight Cases from the Data menu Assumptions Two distributions are available in General Loglinear Analysis Poisson and multinomial Under the Poisson distribution assumption m The total sample size is not fixed before the study or the analysis is not conditional on the total sample size m The event of an observation being in a cell is statistically independent of the cell counts of other cells Under the multinomial distribution assumption m The total sample size is fixed or the analysis is conditional on the total sample size m The cell counts are not statistica
88. aves the parameter estimates and the parameter covariance matrix if selected in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes GENLIN Command Additional Features The command syntax language also allows you to m Specify initial values for parameter estimates as a list of numbers using the CRITERIA subcommand Specify a fixed working correlation matrix using the REPEATED subcommand m Fix covariates at values other than their means when computing estimated marginal means using the EMMEANS subcommand 92 Chapter 7 m Specify custom polynomial contrasts for estimated marginal means using the EMMEANS subcommand m Specify a subset of the factors for which estimated marginal means are displayed to be compared using the specified contrast type using the TABLES and COMPARE keywords of the EMMEANS subcommand See the Command Syntax Reference for complete syntax information Chapter Generalized linear mixed models Generalized linear mixed models extend the linear model so that m The target is linearly related to the factors and covariates via a specified link function m The target can have a non normal distribution m The observations can be correlated Generalized linear mixed models cover a wide variety of models from simple linear regression to complex multilevel models for non normal longitudinal data Examples The district schoo
89. bject variables to ensure proper ordering of measurements Subject and within subject variables cannot be used to define the response but they can perform other functions in the model For example Hospital ID could be used as a factor in the model Covariance Matrix The model based estimator is the negative of the generalized inverse of the Hessian matrix The robust estimator also called the Huber White sandwich estimator is a corrected model based estimator that provides a consistent estimate of the covariance even when the working correlation matrix is misspecified This specification applies to the parameters in the linear model part of the generalized estimating equations while the specification on the Estimation tab applies only to the initial generalized linear model Working Correlation Matrix This correlation matrix represents the within subject dependencies Its size is determined by the number of measurements and thus the combination of values of within subject variables You can specify one of the following structures m Independent Repeated measurements are uncorrelated m AR 1 Repeated measurements have a first order autoregressive relationship The correlation between any two elements is equal to p for adjacent elements p for elements that are separated by a third and so on p is constrained so that 1 lt p lt l m Exchangeable This structure has homogenous correlations between elements It is also known as a
90. bjects factors and within subjects factors can be used in profile plots A profile plot of one factor shows whether the estimated marginal means are increasing or decreasing across levels For two or more factors parallel lines indicate that there is no interaction between factors which means that you can investigate the levels of only one factor Nonparallel lines indicate an interaction 22 Chapter 3 Figure 3 6 Nonparallel plot left and parallel plot right 8 5 85 8 0 8D 75 75 7 0 70 64 Var2 s 65 ar2 Z 60 2 60 1 55 2 55 2 50 _ TU E en 3 1 2 3 1 2 3 War tar After a plot is specified by selecting factors for the horizontal axis and optionally factors for separate lines and separate plots the plot must be added to the Plots list GLM Repeated Measures Post Hoc Comparisons Figure 3 7 Repeated Measures Post Hoc Multiple Comparisons for Observed Means dialog box E Repeated Measures Post Hoc Multiple Comparisons for Observed Means Factor s gender Post Hoc Tests for ka rEqual Variances Assumed E Bonferroni Sidak Scheffe E R E GAY F E RE c w 9 El F LSD E S N K E Tukey F Tukey s b 7 Duncan Gabriel E Waller Duncan E Dunnett Hochberg s GT2 pe e rEqual Yariances Not Assumed Tamhane s T2 7 Dunnett s T3 Games Howell 7 Dunnett s C conte cancel tio Post hoc
91. bution of each random effect to the variance of the dependent variable This procedure is particularly interesting for analysis of mixed models such as split plot univariate repeated measures and random block designs By calculating variance components you can determine where to focus attention in order to reduce the variance Four different methods are available for estimating the variance components minimum norm quadratic unbiased estimator MINQUE analysis of variance ANOVA maximum likelihood ML and restricted maximum likelihood REML Various specifications are available for the different methods Default output for all methods includes variance component estimates If the ML method or the REML method is used an asymptotic covariance matrix table is also displayed Other available output includes an ANOVA table and expected mean squares for the ANOVA method and an iteration history for the ML and REML methods The Variance Components procedure is fully compatible with the GLM Univariate procedure WLS Weight allows you to specify a variable used to give observations different weights for a weighted analysis perhaps to compensate for variations in precision of measurement Example At an agriculture school weight gains for pigs in six different litters are measured after one month The litter variable is a random factor with six levels The six litters studied are a random sample from a large population of pig litters The investiga
92. cases with unusual combinations of values for the independent variables and cases that may have a large impact on the model Cook s distance A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients A large Cook s D indicates that excluding a case from computation of the regression statistics changes the coefficients substantially Leverage values Uncentered leverage values The relative influence of each observation on the model s fit Residuals An unstandardized residual is the actual value of the dependent variable minus the value predicted by the model Standardized Studentized and deleted residuals are also available Ifa WLS variable was chosen weighted unstandardized residuals are available 11 GLM Multivariate Analysis m Unstandardized The difference between an observed value and the value predicted by the model m Weighted Weighted unstandardized residuals Available only if a WLS variable was previously selected m Standardized The residual divided by an estimate of its standard deviation Standardized residuals which are also known as Pearson residuals have a mean of 0 and a standard deviation of 1 m Studentized The residual divided by an estimate of its standard deviation that varies from case to case depending on the distance of each case s values on the independent variables from the means of the independent variabl
93. ce matrices Type I and Type III sums of squares can be used to evaluate different hypotheses Type III is the default Data The dependent variable should be quantitative Factors should be categorical and can have numeric values or string values Covariates and the weight variable should be quantitative Subjects and repeated variables may be of any type Assumptions The dependent variable is assumed to be linearly related to the fixed factors random factors and covariates The fixed effects model the mean of the dependent variable The random effects model the covariance structure of the dependent variable Multiple random effects are considered independent of each other and separate covariance matrices will be computed for each however model terms specified on the same random effect can be correlated The repeated measures model the covariance structure of the residuals The dependent variable is also assumed to come from a normal distribution Related procedures Use the Explore procedure to examine the data before running an analysis If you do not suspect there to be correlated or nonconstant variability you can use the GLM Univariate or GLM Repeated Measures procedure You can alternatively use the Variance Components Analysis procedure if the random effects have a variance components covariance structure and there are no repeated measures Obtaining a Linear Mixed Models Analysis b From the menus choose Analyze 5 Mixed Models 5 Li
94. ciated level of the confidence intervals is displayed in the dialog box GLM Command Additional Features These features may apply to univariate multivariate or repeated measures analysis The command syntax language also allows you to m Specify nested effects in the design using the DESIGN subcommand m Specify tests of effects versus a linear combination of effects or a value using the TEST subcommand m Specify multiple contrasts using the CONTRAST subcommand 13 GLM Multivariate Analysis Include user missing values using the MISSING subcommand Specify EPS criteria using the CRITERIA subcommand Construct a custom L matrix M matrix or K matrix using the LMATRIX MMATRIX or KMATRIX subcommands For deviation or simple contrasts specify an intermediate reference category using the CONTRAST subcommand Specify metrics for polynomial contrasts using the CONTRAST subcommand Specify error terms for post hoc comparisons using the POSTHOC subcommand Compute estimated marginal means for any factor or factor interaction among the factors in the factor list using the EMMEANS subcommand Specify names for temporary variables using the SAVE subcommand Construct a correlation matrix data file using the OUTFILE subcommand Construct a matrix data file that contains statistics from the between subjects ANOVA table using the OUTFILE subcommand Save the design matrix to a new data file using the
95. compound symmetry structure m M dependent Consecutive measurements have a common correlation coefficient pairs of measurements separated by a third have a common correlation coefficient and so on through pairs of measurements separated by m 1 other measurements Measurements with greater separation are assumed to be uncorrelated When choosing this structure specify a value of m less than the order of the working correlation matrix m Unstructured This is a completely general correlation matrix By default the procedure will adjust the correlation estimates by the number of nonredundant parameters Removing this adjustment may be desirable if you want the estimates to be invariant to subject level replication changes in the data 71 Generalized Estimating Equations m Maximum iterations The maximum number of iterations the generalized estimating equations algorithm will execute Specify a non negative integer This specification applies to the parameters in the linear model part of the generalized estimating equations while the specification on the Estimation tab applies only to the initial generalized linear model m Update matrix Elements in the working correlation matrix are estimated based on the parameter estimates which are updated in each iteration of the algorithm If the working correlation matrix is not updated at all the initial working correlation matrix is used throughout the estimation process If the matrix is u
96. covariate 36 Chapter 5 gt Click Fixed or Random and specify at least a fixed effects or random effects model Optionally select a weighting variable Linear Mixed Models Select Subjects Repeated Variables This dialog box allows you to select variables that define subjects and repeated observations and to choose a covariance structure for the residuals See Figure 5 1 on p 35 Subjects A subject is an observational unit that can be considered independent of other subjects For example the blood pressure readings from a patient in a medical study can be considered independent of the readings from other patients Defining subjects becomes particularly important when there are repeated measurements per subject and you want to model the correlation between these observations For example you might expect that blood pressure readings from a single patient during consecutive visits to the doctor are correlated Subjects can also be defined by the factor level combination of multiple variables for example you can specify Gender and Age category as subject variables to model the belief that males over the age of 65 are similar to each other but independent of males under 65 and females All of the variables specified in the Subjects list are used to define subjects for the residual covariance structure You can use some or all of the variables to define subjects for the random effects covariance structure Repeated The variables spec
97. cts and all factor by factor interactions It does not contain covariate interactions Select Custom to specify only a subset of interactions or to specify factor by covariate interactions You must indicate all of the terms to be included in the model Factors amp Covariates The factors and covariates are listed Model The model depends on the nature of your data After selecting Custom you can select the main effects and interactions that are of interest in your analysis The model must contain a random factor Include intercept in model Usually the intercept is included in the model If you can assume that the data pass through the origin you can exclude the intercept Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables 31 Variance Components Analysis Variance Components Options Figure 4 3 Variance Components Options dialog box FFA Variance Components Options Method MINQUE Maximum likelihood ANOVA Re
98. d SSCP sums of squares and cross products matrices In addition to testing hypotheses GLM Repeated Measures produces estimates of parameters Commonly used a priori contrasts are available to perform hypothesis testing on between subjects factors Additionally after an overall F test has shown significance you can use post hoc tests to evaluate differences among specific means Estimated marginal means give estimates of predicted mean values for the cells in the model and profile plots interaction plots of these means allow you to visualize some of the relationships easily Residuals predicted values Cook s distance and leverage values can be saved as new variables in your data file for checking assumptions Also available are a residual SSCP matrix which is a square matrix of sums of squares and cross products of residuals a residual covariance matrix which is the residual SSCP matrix divided by the degrees of freedom of the residuals and the residual correlation matrix which is the standardized form of the residual covariance matrix WLS Weight allows you to specify a variable used to give observations different weights for a weighted least squares WLS analysis perhaps to compensate for different precision of measurement Example Twelve students are assigned to a high or low anxiety group based on their scores on an anxiety rating test The anxiety rating is called a between subjects factor because it divides the subjects into gro
99. d measurements per subject and you want to model the correlation between 95 Generalized linear mixed models these observations For example you might expect that blood pressure readings from a single patient during consecutive visits to the doctor are correlated All of the fields specified as Subjects on the Data Structure tab are used to define subjects for the residual covariance structure and provide the list of possible fields for defining subjects for random effects covariance structures on the Random Effect Block Repeated measures The fields specified here are used to identify repeated observations For example a single variable Week might identify the 10 weeks of observations in a medical study or Month and Day might be used together to identify daily observations over the course of a year Define covariance groups by The fields specified here define independent sets of repeated effects covariance parameters one for each category defined by the cross classification of the grouping fields All subjects have the same covariance type subjects within the same covariance grouping will have the same values for the parameters Repeated covariance type This specifies the covariance structure for the residuals The available structures are First order autoregressive AR1 Autoregressive moving average 1 1 ARMA11 Compound symmetry Diagonal Scaled identity Toeplitz Unstructured Variance components For more inf
100. d the terminal event you can use the Linear Regression procedure to model the relationship between predictors and time to event Obtaining a Cox Regression Analysis gt From the menus choose Analyze gt Survival gt Cox Regression Copyright IBM Corporation 1989 2012 148 149 Cox Regression Analysis Figure 14 1 Cox Regression dialog box E Cox Regression Ey amp Geographic indicato gt 8 morins with service tenu T PA Age in years age amp Marital status marital E Years at current ad 9 Household income i dll Level of education Bootstrap Years with current amp Retired retire amp Gender gender Number of people in Covariates Toll free service tol Equipment rental e amp Calling card service Wireless service PA Long distance last Toll free last month PA Equipment last mont PA Calling card last mo retire Select a time variable Cases whose time values are negative are not analyzed gt Select a status variable and then click Define Event gt Select one or more covariates To include interaction terms select all of the variables involved in the interaction and then click gt a b gt Optionally you can compute separate models for different groups by defining a strata variable Cox Regression Define Categorical Variables Figure 14 2 Cox Regression Define Categorical Covariates dialog box
101. defined labels you can set the reference category by choosing a value from the list This can be convenient when in the middle of specifying a model you don t remember exactly how a particular variable was coded 77 Generalized Estimating Equations Generalized Estimating Equations Predictors Figure 7 5 Generalized Estimating Equations Predictors tab E Generalized Estimating Equations Repeated Type of Model Response _ Predictors Model Estimation Statistics EM Means Save Export Variables tt Factors Aggregate months of service months_service 5 Ship type type all Year of construction construction J Period of operation operation ortens lZ Covariates 3 Offset Variable Offset Variable Le E Logarithm of aggregate months of service log_ Fixed value alue The Predictors tab allows you to specify the factors and covariates used to build model effects and to specify an optional offset Factors Factors are categorical predictors they can be numeric or string Covariates Covariates are scale predictors they must be numeric Note When the response is binomial with binary format the procedure computes deviance and chi square goodness of fit statistics by subpopulations that are based on the cross classification of observed values of the selected factors and covariates You should keep the same set of predictors across multiple runs of the procedure to ensure
102. dent variable covariates and factors Model information Displays the dataset name dependent variable or events and trials variables offset variable scale weight variable probability distribution and link function Goodness of fit statistics Displays two extensions of Akaike s Information Criterion for model selection Quasi likelihood under the independence model criterion QIC for choosing the best correlation structure and another QIC measure for choosing the best subset of predictors Model summary statistics Displays model fit tests including likelihood ratio statistics for the model fit omnibus test and statistics for the Type I or HII contrasts for each effect Parameter estimates Displays parameter estimates and corresponding test statistics and confidence intervals You can optionally display exponentiated parameter estimates in addition to the raw parameter estimates Covariance matrix for parameter estimates Displays the estimated parameter covariance matrix Correlation matrix for parameter estimates Displays the estimated parameter correlation matrix Contrast coefficient L matrices Displays contrast coefficients for the default effects and for the estimated marginal means if requested on the EM Means tab General estimable functions Displays the matrices for generating the contrast coefficient L matrices Iteration history Displays the iteration history for the parameter estimates and log likelihood and pri
103. e adjusted significance of the difference between the nodes connected by the line For deviation contrasts a bar chart is displayed with the model estimated value of the target on the vertical axis and the values of the contrast field on the horizontal axis for interactions a chart is displayed for each level combination of the effects other than the contrast field The bars show the difference between each level of the contrast field and the overall mean which is represented by a black horizontal line For simple contrasts a bar chart is displayed with the model estimated value of the target on the vertical axis and the values of the contrast field on the horizontal axis for interactions a chart is displayed for each level combination of the effects other than the contrast field The bars show the difference between each level of the contrast field except the last and the last level which is represented by a black horizontal line m Table This style displays a table of the model estimated value of the target its standard error and confidence interval for each level combination of the fields in the effect all other predictors are held constant If contrasts were requested another table is displayed with the estimate standard error significance test and confidence interval for each contrast for interactions there a separate set of rows for each level combination of the effects other than the contrast field Additionally a table
104. e eee eens 18 Build Terms eis iena detect roo Aa GAN E E 19 Sum OS QUA oss cress a i ae aar e sew Dan ARN eee aed ede a E ei 19 GLM Repeated Measures Contrasts 1 0 0 0 ce ee tee ene nen 20 Contrast Types ci sees cated Sek cama da eb beaded eee a wea Sarees ka eek Gus 21 GLM Repeated Measures Profile Plots 0c cece eee eee 21 GLM Repeated Measures Post Hoc Comparisons 00000 e cece cece eee eeaee 22 GLM Repeated Measures Save 2 00 cece tte eee teen eens 24 GLM Repeated Measures Options 0 000 c eee tenet eee 25 GLM Command Additional Features 0 0 ccc een nen eens 26 4 Variance Components Analysis 28 Variance Components Model 0 0c e eee ett ee 30 Build Terms KANG cd a ANG whe ae Pa ew es ed 30 Variance Components Options 00000 c cette eee eee 31 Sum of Squares Variance Components 0 0 0 cece cece ence nena 32 Copyright IBM Corporation 1989 2012 v Variance Components Save to New File ooooocoocoococococe ee 33 VARCOMP Command Additional Features 000 cece eee eee teas 33 5 Linear Mixed Models 34 Linear Mixed Models Select Subjects Repeated Variables ee eeeaee 36 Linear Mixed Models Fixed Effects 0 0 0 ccc ccc cen eee ene eens 37 Build Non Nested Terms 0 0 cc ccc ee eee ee eet een tenes 37 Build Nested Terms a aa AANO bkas a een exces a eee Si dw ea Baka kalas ts 38 Sum of Squa
105. e minus the survival function on a linear scale Hazard Displays the cumulative hazard function on a linear scale Log survival Displays the cumulative survival function on a logarithmic scale KM Command Additional Features The command syntax language also allows you to Obtain frequency tables that consider cases lost to follow up as a separate category from censored cases Specify unequal spacing for the test for linear trend Obtain percentiles other than quartiles for the survival time variable See the Command Syntax Reference for complete syntax information Chapter 14 Cox Regression Analysis Cox Regression builds a predictive model for time to event data The model produces a survival function that predicts the probability that the event of interest has occurred at a given time for given values of the predictor variables The shape of the survival function and the regression coefficients for the predictors are estimated from observed subjects the model can then be applied to new cases that have measurements for the predictor variables Note that information from censored subjects that is those that do not experience the event of interest during the time of observation contributes usefully to the estimation of the model Example Do men and women have different risks of developing lung cancer based on cigarette smoking By constructing a Cox Regression model with cigarette usage cigarettes smoked per day and
106. e tests are provided for S N K Tukey s b Duncan R E G W F R E G W O and Waller Tukey s honestly significant difference test Hochberg s GT2 Gabriel s test and Scheff s test are both multiple comparison tests and range tests 10 Chapter 2 GLM Save Figure 2 7 Save dialog box Es Univariate Save Predicted Values CResiduals Unstandardized Unstandardized E Standard error F Standardized L Studentized Diagnostics Deleted Cook s distance E Leverage values Coefficient Statistics y Create coefficient statistics 9 Create a new dataset Dataset name glm coefficients O Write a new data file EEN You can save values predicted by the model residuals and related measures as new variables in the Data Editor Many of these variables can be used for examining assumptions about the data To save the values for use in another IBM SPSS Statistics session you must save the current data file Predicted Values The values that the model predicts for each case Unstandardized The value the model predicts for the dependent variable Weighted Weighted unstandardized predicted values Available only ifa WLS variable was previously selected Standard error An estimate of the standard deviation of the average value of the dependent variable for cases that have the same values of the independent variables Diagnostics Measures to identify
107. each dependent variable there will be a row of parameter estimates a row of significance values for the statistics corresponding to the parameter estimates and a row of residual degrees of freedom For a multivariate model there are similar rows for each dependent variable You can use this matrix data in other procedures that read matrix files Datasets are available for subsequent use in the same session but are not saved as files unless explicitly saved prior to the end of the session Dataset names must conform to variable naming rules GLM Repeated Measures Options Figure 3 9 Repeated Measures Options dialog box E Repeated Measures Options rEstimated Marginal Means Factor s and Factor Interactions Display Means for OVERALL gender time gender time rDisplay Descriptive statistics E Transformation matrix Estimates of effect size E Homogeneity tests CT Observed power E Spread vs level plot E Parameter estimates E Residual plot SSCP matrices E Lack of fit E Residual SSCP matrix E General estimable function Significance level Confidence intervals are 95 0 96 ECN Optional statistics are available from this dialog box Statistics are calculated using a fixed effects model 26 Chapter 3 Estimated Marginal Means Select the factors and interactions for which you want estimates of the population marginal means in the cells These means are adjust
108. ed Linear Models 60 hazard rate in Life Tables 139 Hessian convergence in Generalized Estimating Equations 81 in Generalized Linear Models 56 hierarchical decomposition 5 19 in Variance Components 32 hierarchical loglinear models 124 hierarchical models generalized linear mixed models 93 Hochberg s GT2 in GLM Multivariate 8 in GLM Repeated Measures 22 homogeneity of variance tests in GLM Multivariate 11 in GLM Repeated Measures 25 identity link function in generalized estimating equations 74 in generalized linear models 49 interaction terms 4 19 30 126 130 136 in Linear Mixed Models 37 inverse Gaussian distribution in generalized estimating equations 73 in generalized linear models 48 iteration history in Generalized Estimating Equations 84 in Generalized Linear Models 60 in Linear Mixed Models 41 iterations in Generalized Estimating Equations 81 in Generalized Linear Models 56 in Model Selection Loglinear Analysis 127 Kaplan Meier 143 command additional features 147 comparing factor levels 145 defining events 144 example 143 linear trend for factor levels 145 mean and median survival time 146 plots 146 quartiles 146 saving new variables 146 statistics 143 146 survival status variables 144 survival tables 146 L matrix in Generalized Estimating Equations 84 172 Index in Generalized Linear Models 60 Lagrange multiplier test in Generalized Linear Models
109. ed for the covariates if any Both within subjects and between subjects factors can be selected m Compare main effects Provides uncorrected pairwise comparisons among estimated marginal means for any main effect in the model for both between and within subjects factors This item is available only if main effects are selected under the Display Means For list m Confidence interval adjustment Select least significant difference LSD Bonferroni or Sidak adjustment to the confidence intervals and significance This item is available only if Compare main effects 1s selected Display Select Descriptive statistics to produce observed means standard deviations and counts for all of the dependent variables in all cells Estimates of effect size gives a partial eta squared value for each effect and each parameter estimate The eta squared statistic describes the proportion of total variability attributable to a factor Select Observed power to obtain the power of the test when the alternative hypothesis is set based on the observed value Select Parameter estimates to produce the parameter estimates standard errors f tests confidence intervals and the observed power for each test You can display the hypothesis and error SSCP matrices and the Residual SSCP matrix plus Bartlett s test of sphericity of the residual covariance matrix Homogeneity tests produces the Levene test of the homogeneity of variance for each dependent variable across all le
110. ed with the estimates Predicted Values amp Residuals Saves variables related to the model fitted value Predicted values The model fitted value Standard errors The standard errors of the estimates Degrees of freedom The degrees of freedom associated with the estimates Residuals The data value minus the predicted value 45 Linear Mixed Models MIXED Command Additional Features The command syntax language also allows you to m Specify tests of effects versus a linear combination of effects or a value using the TEST subcommand m Include user missing values using the MISSING subcommand Compute estimated marginal means for specified values of covariates using the WITH keyword of the EMMEANS subcommand m Compare simple main effects of interactions using the EMMEANS subcommand See the Command Syntax Reference for complete syntax information Chapter Generalized Linear Models The generalized linear model expands the general linear model so that the dependent variable is linearly related to the factors and covariates via a specified link function Moreover the model allows for the dependent variable to have a non normal distribution It covers widely used statistical models such as linear regression for normally distributed responses logistic models for binary data loglinear models for count data complementary log log models for interval censored survival data plus many other statistical models thr
111. edom 16 Chapter 3 gt The multivariate approach considers the measurements on a subject to be a sample from a multivariate normal distribution and the variance covariance matrices are the same across the cells formed by the between subjects effects To test whether the variance covariance matrices across the cells are the same Box s M test can be used Related procedures Use the Explore procedure to examine the data before doing an analysis of variance If there are not repeated measurements on each subject use GLM Univariate or GLM Multivariate If there are only two measurements for each subject for example pre test and post test and there are no between subjects factors you can use the Paired Samples T Test procedure Obtaining GLM Repeated Measures From the menus choose Analyze gt General Linear Model gt Repeated Measures Figure 3 1 Repeated Measures Define Factor s dialog box FFA Repeated Measures Define Factor s Within Subject Factor Name AA Number of Levels time 5 Measure Name E tg wat Type a within subject factor name and its number of levels Click Add Repeat these steps for each within subjects factor To define measure factors for a doubly multivariate repeated measures design Type the measure name Click Add After defining all of your factors and measures Click Define 17 GLM Repeated Measures Figure 3 2 Repeated Measures dialo
112. el A profile plot is a line plot in which each point indicates the estimated marginal mean of a dependent variable adjusted for any covariates at one level of a factor The levels of a second factor can be used to make separate lines Each level in a third factor can be used to create a separate plot All factors are available for plots Profile plots are created for each dependent variable A profile plot of one factor shows whether the estimated marginal means are increasing or decreasing across levels For two or more factors parallel lines indicate that there is no interaction between factors which means that you can investigate the levels of only one factor Nonparallel lines indicate an interaction 8 Chapter 2 Figure 2 5 Nonparallel plot left and parallel plot right 8 5 85 8 0 8D 75 75 7 0 70 64 Var2 5 65 ar2 2 60 2 60 man 55 Baka angang 55 e 5 0 nn eit 1d aa 2 1 2 3 1 2 3 War tar After a plot is specified by selecting factors for the horizontal axis and optionally factors for separate lines and separate plots the plot must be added to the Plots list GLM Multivariate Post Hoc Comparisons Figure 2 6 Multivariate Post Hoc Multiple Comparisons for Observed Means dialog box FA Multivariate Post Hoc Multiple Comparisons for Observed Means Factor s Post Hoc Tests for clotsoly proc le rEqual Variances Assumed El tsp F S N K E Waller Duncan E Bonferroni Sidak F Scheffe TC
113. elated and nonconstant variability The mixed linear model therefore provides the flexibility of modeling not only the means of the data but the variances and covariances as well Generalized Linear Models GZLM relaxes the assumption of normality for the error term and requires only that the dependent variable be linearly related to the predictors through a transformation or link function Generalized Estimating Equations GEE extends GZLM to allow repeated measurements General Loglinear Analysis allows you to fit models for cross classified count data and Model Selection Loglinear Analysis can help you to choose between models Logit Loglinear Analysis allows you to fit loglinear models for analyzing the relationship between a categorical dependent and one or more categorical predictors Survival analysis is available through Life Tables for examining the distribution of time to event variables possibly by levels of a factor variable Kaplan Meier Survival Analysis for examining the distribution of time to event variables possibly by levels of a factor variable or producing separate analyses by levels of a stratification variable and Cox Regression for modeling the time to a specified event based upon the values of given covariates Copyright IBM Corporation 1989 2012 1 Chapter GLM Multivariate Analysis The GLM Multivariate procedure provides regression analysis and analysis of variance for multiple dependent variable
114. elation between any two elements is equal to rho for adjacent elements p for elements that are separated by a third and so on p is constrained so that 1 lt p lt l 1 p P P 2 52 2 1 p p E 1 p pe p 1 AR 1 Heterogenous This is a first order autoregressive structure with heterogenous variances The correlation between any two elements is equal to p for adjacent elements p for two elements separated by a third and so on p is constrained to lie between 1 and 1 of 901p 0301 401P 0201P 05 0302P 0402p 0301p 0302p o 0403p 9401p 0402p 0403P o ARMA 1 1 This is a first order autoregressive moving average structure It has homogenous variances The correlation between two elements is equal to dp for adjacent elements 4 p for elements separated by a third and so on p and are the autoregressive and moving average parameters respectively and their values are constrained to lie between 1 and 1 inclusive 1 Op op op 2 6p 1 6p Gp pp op 1 op op op op 1 Compound Symmetry This structure has constant variance and constant covariance Oo Copyright IBM Corporation 1989 2012 162 163 Covariance Structures o o 01 0 01 01 o 0 01 01 01 01 o 0 01 01 01 01 o o Compound Symmetry Correlation Metric This covariance structure has homogenous variances and homogenous correlations between elements TOV dd ed d edd RO Oo Compound Symmetry Heterogenous This covar
115. eneralized Linear Models E Variables il Factors Time of last visit in months time de Patient id all Duration of disease duration pa Treatment group treatment 14 Covariates S Age in years age Offset Variable a Offset Variable EY O Fixed value value The Predictors tab allows you to specify the factors and covariates used to build model effects and to specify an optional offset Factors Factors are categorical predictors they can be numeric or string Covariates Covariates are scale predictors they must be numeric 53 Generalized Linear Models Note When the response is binomial with binary format the procedure computes deviance and chi square goodness of fit statistics by subpopulations that are based on the cross classification of observed values of the selected factors and covariates You should keep the same set of predictors across multiple runs of the procedure to ensure a consistent number of subpopulations Offset The offset term is a structural predictor Its coefficient is not estimated by the model but is assumed to have the value 1 thus the values of the offset are simply added to the linear predictor of the target This is especially useful in Poisson regression models where each case may have different levels of exposure to the event of interest For example when modeling accident rates for individual drivers there is an important difference between a dri
116. ent variable covariates and factors Model information Displays the dataset name dependent variable or events and trials variables offset variable scale weight variable probability distribution and link function Goodness of fit statistics Displays deviance and scaled deviance Pearson chi square and scaled Pearson chi square log likelihood Akaike s information criterion AIC finite sample corrected AIC AICC Bayesian information criterion BIC and consistent AIC CAIC Model summary statistics Displays model fit tests including likelihood ratio statistics for the model fit omnibus test and statistics for the Type I or HI contrasts for each effect Parameter estimates Displays parameter estimates and corresponding test statistics and confidence intervals You can optionally display exponentiated parameter estimates in addition to the raw parameter estimates Covariance matrix for parameter estimates Displays the estimated parameter covariance matrix Correlation matrix for parameter estimates Displays the estimated parameter correlation matrix Contrast coefficient L matrices Displays contrast coefficients for the default effects and for the estimated marginal means if requested on the EM Means tab General estimable functions Displays the matrices for generating the contrast coefficient L matrices Iteration history Displays the iteration history for the parameter estimates and log likelihood and prints the l
117. entized maximum modulus is used Usually Tukey s test is more powerful Gabriel s pairwise comparisons test also uses the Studentized maximum modulus and is generally more powerful than Hochberg s GT2 when the cell sizes are unequal Gabriel s test may become liberal when the cell sizes vary greatly Dunnett s pairwise multiple comparison t test compares a set of treatments against a single control mean The last category is the default control category Alternatively you can choose the first category You can also choose a two sided or one sided test To test that the mean at any level except the control category of the factor is not equal to that of the control category use a two sided test To test whether the mean at any level of the factor is smaller than that of the control category select lt Control Likewise to test whether the mean at any level of the factor is larger than that of the control category select gt Control Ryan Einot Gabriel and Welsch R E G W developed two multiple step down range tests Multiple step down procedures first test whether all means are equal If all means are not equal subsets of means are tested for equality R E G W F is based on an F test and R E G W Q is based on the Studentized range These tests are more powerful than Duncan s multiple range test and Student Newman Keuls which are also multiple step down procedures but they are not recommended for unequal cell sizes When the var
118. equal to 0 or are missing are not used in the analysis Generalized Linear Models Reference Category Figure 6 3 Generalized Linear Models Reference Category dialog box E Generalized Linear Models Reference Category Reference Category Last highest value First lowest value Custom Gas coms a For binary response you can choose the reference category for the dependent variable This can affect certain output such as parameter estimates and saved values but it should not change the model fit For example if your binary response takes values 0 and 1 m By default the procedure makes the last highest valued category or 1 the reference category In this situation model saved probabilities estimate the chance that a given case takes the value 0 and parameter estimates should be interpreted as relating to the likelihood of category 0 52 Chapter 6 m Ifyou specify the first lowest valued category or 0 as the reference category then model saved probabilities estimate the chance that a given case takes the value 1 m Ifyou specify the custom category and your variable has defined labels you can set the reference category by choosing a value from the list This can be convenient when in the middle of specifying a model you don t remember exactly how a particular variable was coded Generalized Linear Models Predictors Figure 6 4 Generalized Linear Models Predictors tab E G
119. er 6 Negative log log x log log x This is appropriate only with the binomial distribution Odds power f x x 1 x 1 a if a 0 f x log x if a 0 a is the required number specification and must be a real number This is appropriate only with the binomial distribution m Probit x x where 07 is the inverse standard normal cumulative distribution function This is appropriate only with the binomial distribution m Power f x x if a 0 f x log x if a 0 a is the required number specification and must be a real number This link can be used with any distribution Generalized Linear Models Response Figure 6 2 Generalized Linear Models dialog box E Generalized Linear Models E Type of Model Response Predictors Model Estimation Statistics EM Means Save Export Variables Dependent Variable Patient id Dependent Variable Age in years age KI al Duration of disease duration Treatment group treatment Category order multinomial only Ascending PA Time of last visit in months time Type of Dependent Variable Binomial Distribution Only O Binary O Number of events occurring in a set of trials Trials Variable UN Trials Variable 2 O Fixed value Number of Trials Scale Weight Scale Weight Variable KI In many cases you can simply specify a dependent variable however variables that take only two values and responses that rec
120. er Survival Analysis Kaplan Meier Compare Factor Levels Figure 13 3 Kaplan Meier Compare Factor Levels dialog box E Kaplan Meier Compare Factor Levels Test Statistics M Log rank Breslow Y Tarone Ware LL Linear trend for factor levels Pooled over strata Pairwise over strata For each stratum Pairwise for each stratum 5 ama e You can request statistics to test the equality of the survival distributions for the different levels of the factor Available statistics are log rank Breslow and Tarone Ware Select one of the alternatives to specify the comparisons to be made pooled over strata for each stratum pairwise over strata or pairwise for each stratum Log rank A test for comparing the equality of survival distributions All time points are weighted equally in this test Breslow A test for comparing the equality of survival distributions Time points are weighted by the number of cases at risk at each time point Tarone Ware A test for comparing the equality of survival distributions Time points are weighted by the square root of the number of cases at risk at each time point Pooled over strata Compares all factor levels in a single test to test the equality of survival curves Pairwise over strata Compares each distinct pair of factor levels Pairwise trend tests are not available For each stratum Performs a separate test of equality of all fac
121. er related to the variance of the response The scale weights are known values that can vary from observation to observation If the scale weight variable is specified the scale parameter which is related to the variance of the response is divided by it for each observation Cases with scale weight values that are less than or equal to 0 or are missing are not used in the analysis Generalized Estimating Equations Reference Category Figure 7 4 Generalized Estimating Equations Reference Category dialog box E Generalized Estimating Equations Reference Category Reference Category Last highest value First lowest value Custom For binary response you can choose the reference category for the dependent variable This can affect certain output such as parameter estimates and saved values but it should not change the model fit For example if your binary response takes values 0 and 1 m By default the procedure makes the last highest valued category or 1 the reference category In this situation model saved probabilities estimate the chance that a given case takes the value 0 and parameter estimates should be interpreted as relating to the likelihood of category 0 m Ifyou specify the first lowest valued category or 0 as the reference category then model saved probabilities estimate the chance that a given case takes the value 1 m Ifyou specify the custom category and your variable has
122. es m Deleted The residual for a case when that case is excluded from the calculation of the regression coefficients It is the difference between the value of the dependent variable and the adjusted predicted value Coefficient Statistics Writes a variance covariance matrix of the parameter estimates in the model to a new dataset in the current session or an external SPSS Statistics data file Also for each dependent variable there will be a row of parameter estimates a row of significance values for the t statistics corresponding to the parameter estimates and a row of residual degrees of freedom For a multivariate model there are similar rows for each dependent variable You can use this matrix file in other procedures that read matrix files GLM Multivariate Options Figure 2 8 Multivariate Options dialog box EE Multivariate Options Estimated Marginal Means Factor s and Factor Interactions Display Means for OVERALL clotsolw proc 2 clotsolv proc Display 7 Descriptive statistics Transformation matrix Estimates of effect size IM Homogeneity tests Z Observed power Spread vs level plot E Parameter estimates E Residual plot SSCP matrices E Lack of fit E Residual SSCP matrix F General estimable function Significance level Confidence intervals are 95 0 96 Gas am Cre 12 Chapter 2 Optional statistics are available from this dialog box Statis
123. esiduals Adjusted residuals F Deviance residuals O Predicted values Select the values you want to save as new variables in the active dataset The suffix in the new variable names increments to make a unique name for each saved variable The saved values refer to the aggregated data to cells in the contingency table even if the data are recorded in individual observations in the Data Editor If you save residuals or predicted values for unaggregated data the saved value for a cell in the contingency table is entered in the Data Editor for each case in that cell To make sense of the saved values you should aggregate the data to obtain the cell counts Four types of residuals can be saved raw standardized adjusted and deviance The predicted values can also be saved m Residuals Also called the simple or raw residual it is the difference between the observed cell count and its expected count m Standardized residuals The residual divided by an estimate of its standard error Standardized residuals are also known as Pearson residuals m Adjusted residuals The standardized residual divided by its estimated standard error Since the adjusted residuals are asymptotically standard normal when the selected model is correct they are preferred over the standardized residuals for checking for normality m Deviance residuals The signed square root of an individual contribution to the likelihood ratio chi square statistic
124. estimated means are produced Confidence This displays upper and lower confidence limits for the marginal means using the confidence level specified as part of the Build Options Estimated Means Custom Effects These are tables and charts for user requested fixed all factor effects Styles There are different display styles which are accessible from the Style dropdown list m Diagram This style displays a line chart of the model estimated value of the target on the vertical axis for each value of the main effect or first listed effect in an interaction on the horizontal axis a separate line is produced for each value of the second listed effect in an interaction a separate chart is produced for each value of the third listed effect in a three way interaction all other predictors are held constant If contrasts were requested another chart is displayed to compare levels of the contrast field for interactions a chart is displayed for each level combination of the effects other than the contrast field For pairwise contrasts it is a distance network chart that is a graphical representation of the comparisons table in which the distances between nodes in the network correspond to differences between samples Yellow lines correspond to statistically significant differences black lines correspond to non significant differences Hovering over a line in 123 Generalized linear mixed models the network displays a tooltip with th
125. esting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying 4 4 is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Intercept The intercept is usually included in the model If you can assume the data pass through the origin you can exclude the intercept Models with the multinomial ordinal distribution do not have a single intercept term instead there are threshold parameters that define transition points between adjacent categories The thresholds are always included in the model 81 Generalized Estimating Equations Generalized Estimating Equations Estimation Figure 7 8 Generalized Estimating Equations Estimation tab El Generalized Estimating Equations Repeated Type of Model Response Predictors Model Estimation statistics EM Means Save Export gt Parameter Estimation Method lybrid ba E Get initial values for parameter estimates from a dataset Maximum Fisher Scoring Iterations h Scale Parameter Method 4 Value riterations Maximum Iterations 100 a Maximum Step Halving Convergence Criteria At least one convergence criterion must be specified with a minimum
126. ests Residual method if your sample size is sufficiently large or the data are balanced or the model uses a simpler covariance type for example scaled identity or 108 Chapter 8 diagonal This is the default Choose Varied across tests Satterthwaite approximation if your sample size is small or the data are unbalanced or the model uses a complicated covariance type for example unstructured Tests of fixed effects and coefficients This is the method for computing the parameter estimates covariance matrix Choose the robust estimate if you are concerned that the model assumptions are violated Estimated Means Figure 8 10 Estimated Means settings pa Do you want to estimate the target F Specify custom estimated means and contrasts Save Fields Estimate Means Contrast Type Contrast Field None w dl center size dv ail Lol Continuous fields will be held constant when estimating the target Display estimated means in terms of Adjust for multiple comparisons using Original target scale Least significant difference uar Link function transformation This tab allows you to display the estimated marginal means for levels of factors and factor interactions Estimated marginal means are not available for multinomial models Terms The model terms in the Fixed Effects that are entirely comprised of categorical fields are listed here Check each term for which you want the model to produce e
127. f those statistics in the initial generalized linear model this scale parameter estimate is then passed to the generalized estimating equations which treat it as fixed Alternatively specify a fixed value for the scale parameter It will be treated as fixed in estimating the initial generalized linear model and the generalized estimating equations m Initial values The procedure will automatically compute initial values for parameters Alternatively you can specify initial values for the parameter estimates The iterations and convergence criteria specified on this tab are applicable only to the initial generalized linear model For estimation criteria used in fitting the generalized estimating equations see the Repeated tab Iterations m Maximum iterations The maximum number of iterations the algorithm will execute Specify a non negative integer m Maximum step halving At each iteration the step size is reduced by a factor of 0 5 until the log likelihood increases or maximum step halving is reached Specify a positive integer m Check for separation of data points When selected the algorithm performs tests to ensure that the parameter estimates have unique values Separation occurs when the procedure can produce a model that correctly classifies every case This option is available for multinomial responses and binomial responses with binary format Convergence Criteria m Parameter convergence When selected the algorithm st
128. fficients with significance values greater than the slider value are hidden This does not change the model but simply allows you to focus on the most important coefficients By default the value is 1 00 so that no coefficients are filtered based on significance Random Effect Covariances This view displays the random effects covariance matrix G Styles There are different display styles which are accessible from the Style dropdown list m Covariance values This is a heat map of the covariance matrix in which effects are sorted from top to bottom in the order in which they were specified on the Fixed Effects settings Colors in the corrgram correspond to the cell values as shown in the key This is the default Corrgram This is a heat map of the covariance matrix Compressed This is a heat map of the covariance matrix without the row and column headings Blocks If there are multiple random effect blocks then there is a Block dropdown list for selecting the block to display 121 Generalized linear mixed models Groups If a random effect block has a group specification then there is a Group dropdown list for selecting the group level to display Multinomial If the multinomial distribution is in effect then the Multinomial drop down list controls which target category to display The sort order of the values in the list is determined by the specification on the Build Options settings Covariance Parameters Figure 8 20
129. ft Corporation analysis of covariance in GLM Multivariate 2 analysis of variance in generalized linear mixed models 93 in Variance Components 31 ANOVA in GLM Multivariate 2 in GLM Repeated Measures 14 backward elimination in Model Selection Loglinear Analysis 124 Bartlett s test of sphericity in GLM Multivariate 11 binomial distribution in generalized estimating equations 73 in generalized linear models 48 Bonferroni in GLM Multivariate 8 in GLM Repeated Measures 22 Box s M test in GLM Multivariate 11 Breslow test in Kaplan Meier 145 build terms 4 19 30 126 130 136 case processing summary in Generalized Estimating Equations 84 in Generalized Linear Models 60 censored cases in Cox Regression 148 in Kaplan Meier 143 in Life Tables 139 complementary log log link function in generalized estimating equations 74 in generalized linear models 49 confidence intervals in General Loglinear Analysis 131 in GLM Multivariate 11 in GLM Repeated Measures 25 in Linear Mixed Models 42 in Logit Loglinear Analysis 136 contingency tables in General Loglinear Analysis 128 contrast coefficients matrix in Generalized Estimating Equations 84 in Generalized Linear Models 60 contrasts in Cox Regression 149 in General Loglinear Analysis 128 in Logit Loglinear Analysis 133 Cook s distance in Generalized Linear Models 64 in GLM 10 in GLM Repeated Measures 24 Copyright IBM Corp
130. fy the distribution and link function for your model providing shortcuts for several common models that are categorized by response type Model Types Scale Response m Linear Specifies Normal as the distribution and Identity as the link function m Gamma with log link Specifies Gamma as the distribution and Log as the link function Ordinal Response m Ordinal logistic Specifies Multinomial ordinal as the distribution and Cumulative logit as the link function m Ordinal probit Specifies Multinomial ordinal as the distribution and Cumulative probit as the link function 73 Generalized Estimating Equations Counts m Poisson loglinear Specifies Poisson as the distribution and Log as the link function m Negative binomial with log link Specifies Negative binomial with a value of 1 for the ancillary parameter as the distribution and Log as the link function To have the procedure estimate the value of the ancillary parameter specify a custom model with Negative binomial distribution and select Estimate value in the Parameter group Binary Response or Events Trials Data m Binary logistic Specifies Binomial as the distribution and Logit as the link function m Binary probit Specifies Binomial as the distribution and Probit as the link function m Interval censored survival Specifies Binomial as the distribution and Complementary log log as the link function Mixture m Tweedie with log link Specifies Tweedie
131. g box EH Repeated Measures amp Patient ID patid Age in years age wat0 1 wat ED mom wat2 3 wat wat3 4 wat wgt 5 wat Between Subjects Factor s ol Covariates a to sa Gm aa Select a dependent variable that corresponds to each combination of within subjects factors and optionally measures on the list To change positions of the variables use the up and down arrows To make changes to the within subjects factors you can reopen the Repeated Measures Define Factor s dialog box without closing the main dialog box Optionally you can specify between subjects factor s and covariates GLM Repeated Measures Define Factors GLM Repeated Measures analyzes groups of related dependent variables that represent different measurements of the same attribute This dialog box lets you define one or more within subjects factors for use in GLM Repeated Measures See Figure 3 1 on p 16 Note that the order in which you specify within subjects factors is important Each factor constitutes a level within the previous factor To use Repeated Measures you must set up your data correctly You must define within subjects factors in this dialog box Notice that these factors are not existing variables in your data but rather factors that you define here Example In a weight loss study suppose the weights of several people are measured each week for five weeks In the data file each person
132. g the data m Scan Data Reads the data in the active dataset and assigns default measurement level to any fields with a currently unknown measurement level If the dataset is large that may take some time m Assign Manually Opens a dialog that lists all fields with an unknown measurement level You can use this dialog to assign measurement level to those fields You can also assign measurement level in Variable View of the Data Editor Since measurement level is important for this procedure you cannot access the dialog to run this procedure until all fields have a defined measurement level 97 Generalized linear mixed models Target Figure 8 3 Target settings Target Fixed Effects Use predefined target Use custom target Random Effects Target Weight and Offset Number of convulsions 8 More F Use number of trials as denominator E Customize reference category o E Target Distribution and Relationship Link with the Linear Model Linear model Gamma regression Loglinear Negative binomial regression Custom BIG Description Loglinear uses a Poisson distribution with a log link which should be used when the target represents a count of occurrences in a fixed period of time These settings define the target its distribution and its relationship to the predictors through the link function Target The target is required It can have any measurement level and the measurement le
133. gender entered as covariates you can test hypotheses regarding the effects of gender and cigarette usage on time to onset for lung cancer Statistics For each model 2LL the likelihood ratio statistic and the overall chi square For variables in the model parameter estimates standard errors and Wald statistics For variables not in the model score statistics and residual chi square Data Your time variable should be quantitative but your status variable can be categorical or continuous Independent variables covariates can be continuous or categorical if categorical they should be dummy or indicator coded there is an option in the procedure to recode categorical variables automatically Strata variables should be categorical coded as integers or short strings Assumptions Observations should be independent and the hazard ratio should be constant across time that is the proportionality of hazards from one case to another should not vary over time The latter assumption is known as the proportional hazards assumption Related procedures If the proportional hazards assumption does not hold see above you may need to use the Cox with Time Dependent Covariates procedure If you have no covariates or if you have only one categorical covariate you can use the Life Tables or Kaplan Meier procedure to examine survival or hazard functions for your sample s If you have no censored data in your sample that is every case experience
134. greater than 0 Minimum Type wd Change in parameter estimates 1E 006 E Change in log likelihood E Hessian convergence Singularity Tolerance 16 012 Parameter Estimation The controls in this group allow you to specify estimation methods and to provide initial values for the parameter estimates m Method You can select a parameter estimation method choose between Newton Raphson Fisher scoring or a hybrid method in which Fisher scoring iterations are performed before switching to the Newton Raphson method If convergence is achieved during the Fisher scoring phase of the hybrid method before the maximum number of Fisher iterations is reached the algorithm continues with the Newton Raphson method m Scale Parameter Method You can select the scale parameter estimation method Maximum likelihood jointly estimates the scale parameter with the model effects note that this option is not valid if the response has a negative binomial Poisson or binomial distribution Since the concept of likelihood does not enter into generalized estimating equations this specification applies only to the initial generalized linear model this scale parameter estimate is then passed to the generalized estimating equations which update the scale parameter by the Pearson chi square divided by its degrees of freedom 82 Chapter 7 The deviance and Pearson chi square options estimate the scale parameter from the value o
135. gure 5 7 Linear Mixed Models EM Means dialog box E Linear Mixed Models EM Means rEstimated Marginal Means of Fitted Models Factors s and Factor Interactions Display Means for OVERALL shoptor veg style usecoup Le shopfor veg shoptor style shopfor usecoup veg style veg usecoup style usecoup M Compare main effects shopfor veg style Confidence Interval Adjustment shopfor style usecoup LSD none veg style usecoup Reference Category shopfor veg usecoup None all pairwise shopfor veg style usecoup First Last Custom ECN Estimated Marginal Means of Fitted Models This group allows you to request model predicted estimated marginal means of the dependent variable in the cells and their standard errors for the specified factors Moreover you can request that factor levels of main effects be compared Factor s and Factor Interactions This list contains factors and factor interactions that have been specified in the Fixed dialog box plus an OVERALL term Model terms built from covariates are excluded from this list 44 Chapter 5 Display Means for The procedure will compute the estimated marginal means for factors and factor interactions selected to this list If OVERALL is selected the estimated marginal means of the dependent variable are displayed collapsing over all factors Note that any selected factors or factor interactions remain selected unles
136. he control variables Calculate approximate rather than exact comparisons See the Command Syntax Reference for complete syntax information Chapter 13 Kaplan Meier Survival Analysis There are many situations in which you would want to examine the distribution of times between two events such as length of employment time between being hired and leaving the company However this kind of data usually includes some censored cases Censored cases are cases for which the second event isn t recorded for example people still working for the company at the end of the study The Kaplan Meier procedure is a method of estimating time to event models in the presence of censored cases The Kaplan Meier model is based on estimating conditional probabilities at each time point when an event occurs and taking the product limit of those probabilities to estimate the survival rate at each point in time Example Does a new treatment for AIDS have any therapeutic benefit in extending life You could conduct a study using two groups of AIDS patients one receiving traditional therapy and the other receiving the experimental treatment Constructing a Kaplan Meier model from the data would allow you to compare overall survival rates between the two groups to determine whether the experimental treatment is an improvement over the traditional therapy You can also plot the survival or hazard functions and compare them visually for more detailed informatio
137. he effects list The type of effect created depends upon which hotspot you drop the selection Categorical nominal and ordinal fields are used as factors in the model and continuous fields are used as covariates m Main Dropped fields appear as separate main effects at the bottom of the effects list m 2 way All possible pairs of the dropped fields appear as 2 way interactions at the bottom of the effects list m 3 way All possible triplets of the dropped fields appear as 3 way interactions at the bottom of the effects list m The combination of all dropped fields appear as a single interaction at the bottom of the effects list 105 Generalized linear mixed models The buttons to the right of the Effect Builder allow you to Delete terms from the fixed effects model by selecting the terms you want to delete and clicking the delete button Reorder the terms within the fixed effects model by selecting the terms you want to reorder and clicking the up or down arrow and Add nested terms to the model using the Add a Custom Term dialog by clicking on the Add a Custom Term button E BDE Y Include Intercept The intercept is not included in the random effects model by default If you can assume the data pass through the origin you can exclude the intercept Define covariance groups by The fields specified here define independent sets of random effects covariance parameters one for each category defined by the cross clas
138. he matrix file is as follows m Split variables If used any variables defining splits m RowType Takes values and value labels COV covariances CORR correlations EST parameter estimates SE standard errors SIG significance levels and DF sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other row types 66 Chapter 6 m VarName Takes values P P2 corresponding to an ordered list of all estimated model parameters except the scale or negative binomial parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the Parameter estimates table The cells are blank for other row types P1 P2 These variables correspond to an ordered list of all model parameters including the scale and negative binomial parameters as appropriate with variable labels corresponding to the parameter strings shown in the Parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correlations are set to the system missing value all parameter estimates are set at zero and all standard errors significance levels and residual degrees of freedom are set to the system missing value For the scale parameter covariances correlations significance level and degrees of freedom are set to the system missing value If t
139. he original response metric When the response distribution is binomial and the dependent variable is binary the procedure saves predicted probabilities When the response distribution is multinomial the item label becomes Cumulative predicted probability and the procedure saves the cumulative predicted probability for each category of the response except the last up to the number of specified categories to save Lower bound of confidence interval for mean of response Saves the lower bound of the confidence interval for the mean of the response When the response distribution is multinomial the item label becomes Lower bound of confidence interval for cumulative predicted probability and the procedure saves the lower bound for each category of the response except the last up to the number of specified categories to save 64 Chapter 6 Upper bound of confidence interval for mean of response Saves the upper bound of the confidence interval for the mean of the response When the response distribution is multinomial the item label becomes Upper bound of confidence interval for cumulative predicted probability and the procedure saves the upper bound for each category of the response except the last up to the number of specified categories to save Predicted category For models with binomial distribution and binary dependent variable or multinomial distribution this saves the predicted response category for each case This option i
140. he scale parameter is estimated via maximum likelihood the standard error is given otherwise it is set to the system missing value For the negative binomial parameter covariances correlations significance level and degrees of freedom are set to the system missing value If the negative binomial parameter is estimated via maximum likelihood the standard error is given otherwise it is set to the system missing value If there are splits then the list of parameters must be accumulated across all splits In a given split some parameters may be irrelevant this is not the same as redundant For irrelevant parameters all covariances or correlations parameter estimates standard errors significance levels and degrees of freedom are set to the system missing value You can use this matrix file as the initial values for further model estimation note that this file is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Even then you should take care that all parameters in this matrix file have the same meaning for the procedure reading the file Export model as XML Saves the parameter estimates and the parameter covariance matrix if selected in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes GENLIN Command Additional Features The command syntax language also allows you
141. he variable name se 2 m Hazard Cumulative hazard function estimate The default variable name is the prefix haz_ with a sequential number appended to it For example if haz_1 already exists Kaplan Meier assigns the variable name haz_ 2 Cumulative events Cumulative frequency of events when cases are sorted by their survival times and status codes The default variable name is the prefix cum_ with a sequential number appended to it For example if cum 1 already exists Kaplan Meier assigns the variable name cum_2 Kaplan Meier Options Figure 13 5 Kaplan Meier Options dialog box FA Kaplan Meier Options Statistics Survival table s ad Mean and median survival Quartiles rPlots Survival Z One minus survival E Hazard E Log Survival EC 147 Kaplan Meier Survival Analysis You can request various output types from Kaplan Meier analysis Statistics You can select statistics displayed for the survival functions computed including survival table s mean and median survival and quartiles If you have included factor variables separate statistics are generated for each group Plots Plots allow you to examine the survival one minus survival hazard and log survival functions visually If you have included factor variables functions are plotted for each group Survival Displays the cumulative survival function on a linear scale One minus survival Plots on
142. hese factors are categorical and can have numeric values or string values Within subjects factors are defined in the Repeated Measures Define Factor s dialog box Covariates are quantitative variables that are related to the dependent variable For a repeated measures analysis these should remain constant at each level of a within subjects variable The data file should contain a set of variables for each group of measurements on the subjects The set has one variable for each repetition of the measurement within the group A within subjects factor is defined for the group with the number of levels equal to the number of repetitions For example measurements of weight could be taken on different days If measurements of the same property were taken on five days the within subjects factor could be specified as day with five levels For multiple within subjects factors the number of measurements for each subject is equal to the product of the number of levels of each factor For example if measurements were taken at three different times each day for four days the total number of measurements is 12 for each subject The within subjects factors could be specified as day 4 and time 3 Assumptions A repeated measures analysis can be approached in two ways univariate and multivariate The univariate approach also known as the split plot or mixed model approach considers the dependent variables as responses to the levels of within subjects factor
143. hown significance you can use post hoc tests to evaluate differences among specific means Estimated marginal means give estimates of predicted mean values for the cells in the model and profile plots interaction plots of these means allow you to visualize some of the relationships easily The post hoc multiple comparison tests are performed for each dependent variable separately Residuals predicted values Cook s distance and leverage values can be saved as new variables in your data file for checking assumptions Also available are a residual SSCP matrix which is a square matrix of sums of squares and cross products of residuals a residual covariance matrix which is the residual SSCP matrix divided by the degrees of freedom of the residuals and the residual correlation matrix which is the standardized form of the residual covariance matrix WLS Weight allows you to specify a variable used to give observations different weights for a weighted least squares WLS analysis perhaps to compensate for different precision of measurement Example A manufacturer of plastics measures three properties of plastic film tear resistance gloss and opacity Two rates of extrusion and two different amounts of additive are tried and the three properties are measured under each combination of extrusion rate and additive amount The manufacturer finds that the extrusion rate and the amount of additive individually produce significant results but that the
144. iables with non negative integer values If a data value is non integer less than 0 or missing then the corresponding case is not used in the analysis The value of the negative binomial distribution s ancillary parameter can be any number greater than or equal to 0 you can set it to a fixed value or allow it to be estimated by the procedure When the ancillary parameter is set to 0 using this distribution is equivalent to using the Poisson distribution m Normal This is appropriate for scale variables whose values take a symmetric bell shaped distribution about a central mean value The dependent variable must be numeric 74 Chapter 7 m Poisson This distribution can be thought of as the number of occurrences of an event of interest in a fixed period of time and is appropriate for variables with non negative integer values If a data value is non integer less than O or missing then the corresponding case is not used in the analysis m Tweedie This distribution is appropriate for variables that can be represented by Poisson mixtures of gamma distributions the distribution is mixed in the sense that it combines properties of continuous takes non negative real values and discrete distributions positive probability mass at a single value 0 The dependent variable must be numeric with data values greater than or equal to zero If a data value is less than zero or missing then the corresponding case is not used in the an
145. iance is robust to departures from normality although the data should be symmetric To check assumptions you can use homogeneity of variances tests including Box s M and spread versus level plots You can also examine residuals and residual plots Related procedures Use the Explore procedure to examine the data before doing an analysis of variance For a single dependent variable use GLM Univariate If you measured the same dependent variables on several occasions for each subject use GLM Repeated Measures Obtaining GLM Multivariate Tables gt From the menus choose Analyze gt General Linear Model gt Multivariate Figure 2 1 Multivariate dialog box E Multivariate Dependent Variables Taking anti clotting Length of stay los Hospital ID site Nap Treatment costs cost Attending physician Time to hospital time DA Dead on arrival doa Freed Factor ay EKG resut eka Go Clot dissolving drugs amp CPK blood result cpk amp Surgical treatment proc amp Troponin T blood re amp Hemorrhaging bleed Covariate s o Magnesium magnes B amp Digitalis digi y amp Beta blockers beta Died in ER der Surgical complicatio WLS Weight dl Surgery result result Ca Gb Gal Ga a LF SEFERE i Select at least two dependent variables 4 Chapter 2 Optionally you can specify Fixed Factor s Covari
146. iance structure has heterogenous variances and constant correlation between elements of 0201P 0301P 0401P 0201p 03 0302p 0402P 0301P 0302P 02 0403P 0401P 0402P 04030 03 Diagonal This covariance structure has heterogenous variances and zero correlation between elements Gar 0 0 0 0 o2 0 0 0 0 of 0 0 0 0 Factor Analytic First Order This covariance structure has heterogenous variances that are composed of a term that is heterogenous across elements and a term that is homogenous across elements The covariance between any two elements is the square root of the product of their heterogenous variance terms M d A 234 AA A2 1 MR d A3Ag A42 A341 A342 Na d 24243 MA A439 M43 M d Factor Analytic First Order Heterogenous This covariance structure has heterogenous variances that are composed of two terms that are heterogenous across elements The covariance between any two elements is the square root of the product of the first of their heterogenous variance terms 164 Appendix B A d N9 4341 M44 A944 Ng da 4342 As Ao AM Agd2 Atd Mbs As Az As Ao As Az A d4 Huynh Feldt This is a circular matrix in which the covariance between any two elements is equal to the average of their variances minus a constant Neither the variances nor the covariances are constant 2 oi o aitos 01104 Pr 2 2 i ie thor a bat PA red AE 2 3 2 4 A 03 fos _ A 01 03 03 403 A 0 azto 2 2 2 2 2 2 2 01 0
147. iances are unequal use Tamhane s T2 conservative pairwise comparisons test based on a f test Dunnett s T3 pairwise comparison test based on the Studentized maximum modulus Games Howell pairwise comparison test sometimes liberal or Dunnett s C pairwise comparison test based on the Studentized range Duncan s multiple range test Student Newman Keuls S N K and Tukey s b are range tests that rank group means and compute a range value These tests are not used as frequently as the tests previously discussed The Waller Duncan t test uses a Bayesian approach This range test uses the harmonic mean of the sample size when the sample sizes are unequal The significance level of the Scheff test is designed to allow all possible linear combinations of group means to be tested not just pairwise comparisons available in this feature The result is that the Scheffe test is often more conservative than other tests which means that a larger difference between means is required for significance The least significant difference LSD pairwise multiple comparison test is equivalent to multiple individual tests between all pairs of groups The disadvantage of this test is that no attempt is made to adjust the observed significance level for multiple comparisons Tests displayed Pairwise comparisons are provided for LSD Sidak Bonferroni Games Howell Tamhane s T2 and T3 Dunnett s C and Dunnett s T3 Homogeneous subsets for rang
148. ibution in generalized estimating equations 73 Index in generalized linear models 48 Poisson regression generalized linear mixed models 93 in General Loglinear Analysis 128 power estimates in GLM Multivariate 11 in GLM Repeated Measures 25 power link function in generalized estimating equations 74 in generalized linear models 49 predicted values in General Loglinear Analysis 132 in Linear Mixed Models 44 in Logit Loglinear Analysis 137 probit analysis generalized linear mixed models 93 probit link function in generalized estimating equations 74 in generalized linear models 49 profile plots in GLM Multivariate 7 in GLM Repeated Measures 21 proportional hazards model in Cox Regression 148 R E G W F in GLM Multivariate 8 in GLM Repeated Measures 22 R E G W Q in GLM Multivariate 8 in GLM Repeated Measures 22 random effects in Linear Mixed Models 39 random effect covariance matrix in Linear Mixed Models 42 random effect priors in Variance Components 31 reference category in Generalized Estimating Equations 76 78 in Generalized Linear Models 51 repeated measures variables in Linear Mixed Models 36 residual covariance matrix in Linear Mixed Models 42 residual plots in GLM Multivariate 11 in GLM Repeated Measures 25 residual SSCP in GLM Multivariate 11 in GLM Repeated Measures 25 residuals in General Loglinear Analysis 132 in Generalized Estimating Equations
149. icted values in Linear Mixed Models 44 frequencies in Model Selection Loglinear Analysis 127 full factorial models in GLM Repeated Measures 18 in Variance Components 30 Gabriel s pairwise comparisons test in GLM Multivariate 8 in GLM Repeated Measures 22 Games and Howell s pairwise comparisons test in GLM Multivariate 8 in GLM Repeated Measures 22 gamma distribution in generalized estimating equations 73 in generalized linear models 48 Gehan test in Life Tables 141 general estimable function in Generalized Estimating Equations 84 in Generalized Linear Models 60 general linear model generalized linear mixed models 93 General Loglinear Analysis cell covariates 128 cell structures 128 command additional features 132 confidence intervals 131 contrasts 128 criteria 131 display options 131 distribution of cell counts 128 factors 128 model specification 130 plots 131 residuals 132 saving predicted values 132 saving variables 132 Generalized Estimating Equations 68 estimated marginal means 86 estimation criteria 81 initial values 82 model export 90 model specification 79 options for categorical factors 78 predictors 77 reference category for binary response 76 response 75 save variables to active dataset 88 statistics 84 type of model 72 generalized linear mixed models 93 analysis weight 106 classification table 115 covariance parameters 121 custom terms 101 data structure
150. ified in this list are used to identify repeated observations For example a single variable Week might identify the 10 weeks of observations in a medical study or Month and Day might be used together to identify daily observations over the course of a year Repeated Covariance type This specifies the covariance structure for the residuals The available structures are as follows Ante Dependence First Order AR 1 AR 1 Heterogeneous ARMA 1 1 Compound Symmetry Compound Symmetry Correlation Metric Compound Symmetry Heterogeneous Diagonal Factor Analytic First Order Factor Analytic First Order Heterogeneous Huynh Feldt Scaled Identity Toeplitz Toeplitz Heterogeneous Unstructured Unstructured Correlations 37 Linear Mixed Models For more information see the topic Covariance Structures in Appendix B on p 162 Linear Mixed Models Fixed Effects Figure 5 3 Linear Mixed Models Fixed Effects dialog box FF Linear Mixed Models Fixed Effects Fixed Effects Build terms Build nested terms Factors and Covariates Model ul shopfor Lu veg Jul style Hl usecoup Y Include intercept Sum of squares Type Fixed Effects There is no default model so you must explicitly specify the fixed effects Alternatively you can build nested or non nested terms Include Intercept The intercept is usually included in the model If you can assume the data pass through the origin yo
151. ign of the residual observed count minus expected count Deviance residuals have an asymptotic standard normal distribution GENLOG Command Additional Features The command syntax language also allows you to m Calculate linear combinations of observed cell frequencies and expected cell frequencies and print residuals standardized residuals and adjusted residuals of that combination using the GERESID subcommand m Change the default threshold value for redundancy checking using the CRITERIA subcommand m Display the standardized residuals using the PRINT subcommand See the Command Syntax Reference for complete syntax information Chapter Logit Loglinear Analysis The Logit Loglinear Analysis procedure analyzes the relationship between dependent or response variables and independent or explanatory variables The dependent variables are always categorical while the independent variables can be categorical factors Other independent variables cell covariates can be continuous but they are not applied on a case by case basis The weighted covariate mean for a cell is applied to that cell The logarithm of the odds of the dependent variables is expressed as a linear combination of parameters A multinomial distribution is automatically assumed these models are sometimes called multinomial logit models This procedure estimates parameters of logit loglinear models using the Newton Raphson algorithm You can select from
152. inear Analysis Model Specify Model Saturated Custom Factors amp Covariates Terms in Model Gn Core Cree Specify Model A saturated model contains all main effects and interactions involving factor variables It does not contain covariate terms Select Custom to specify only a subset of interactions or to specify factor by covariate interactions Factors amp Covariates The factors and covariates are listed Terms in Model The model depends on the nature of your data After selecting Custom you can select the main effects and interactions that are of interest in your analysis You must indicate all of the terms to be included in the model Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables 131 General Loglinear Analysis General Loglinear Analysis Options Figure 10 3 General Loglinear Analysis Options dialog box E General Loglinear Analysis Options Display Plot
153. ins the linear effect across all categories the second degree of freedom the quadratic effect and so on These contrasts are often used to estimate polynomial trends Scale Estimated marginal means can be computed for the response based on the original scale of the dependent variable or for the linear predictor based on the dependent variable as transformed by the link function Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level Sidak This method provides tighter bounds than the Bonferroni approach Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level 63 Generalized Linear
154. ion Chapter General Loglinear Analysis The General Loglinear Analysis procedure analyzes the frequency counts of observations falling into each cross classification category in a crosstabulation or a contingency table Each cross classification in the table constitutes a cell and each categorical variable is called a factor The dependent variable is the number of cases frequency in a cell of the crosstabulation and the explanatory variables are factors and covariates This procedure estimates maximum likelihood parameters of hierarchical and nonhierarchical loglinear models using the Newton Raphson method Either a Poisson or a multinomial distribution can be analyzed You can select up to 10 factors to define the cells of a table A cell structure variable allows you to define structural zeros for incomplete tables include an offset term in the model fit a log rate model or implement the method of adjustment of marginal tables Contrast variables allow computation of generalized log odds ratios GLOR Model information and goodness of fit statistics are automatically displayed You can also display a variety of statistics and plots or save residuals and predicted values in the active dataset Example Data from a report of automobile accidents in Florida are used to determine the relationship between wearing a seat belt and whether an injury was fatal or nonfatal The odds ratio indicates significant evidence of a relationship St
155. ion Counts m Poisson loglinear Specifies Poisson as the distribution and Log as the link function m Negative binomial with log link Specifies Negative binomial with a value of 1 for the ancillary parameter as the distribution and Log as the link function To have the procedure estimate the value of the ancillary parameter specify a custom model with Negative binomial distribution and select Estimate value in the Parameter group Binary Response or Events Trials Data m Binary logistic Specifies Binomial as the distribution and Logit as the link function m Binary probit Specifies Binomial as the distribution and Probit as the link function m Interval censored survival Specifies Binomial as the distribution and Complementary log log as the link function Mixture m Tweedie with log link Specifies Tweedie as the distribution and Log as the link function m Tweedie with identity link Specifies Tweedie as the distribution and Identity as the link function Custom Specify your own combination of distribution and link function Distribution This selection specifies the distribution of the dependent variable The ability to specify a non normal distribution and non identity link function is the essential improvement of the generalized linear model over the general linear model There are many possible distribution link function combinations and several may be appropriate for any given dataset so your choice can be guided by
156. ion convergence is assumed if a statistic based on the Hessian is less than the value specified For the Relative specification convergence is assumed if the statistic is less than the product of the value specified and the absolute value of the log likelihood The criterion is not used if the value specified equals 0 Maximum scoring steps Requests to use the Fisher scoring algorithm up to iteration number n Specify a positive integer Singularity tolerance This is the value used as tolerance in checking singularity Specify a positive value Linear Mixed Models Statistics Figure 5 6 Linear Mixed Models Statistics dialog box E Linear Mixed Models Statistics Summary Statistics F Descriptive statistics E Case Processing Summary Model Statistics Parameter estimates Tests for covariance parameters Correlations of parameter estimates Covariances of parameter estimates Covariances of random effects Covariances of residuals Contrast coefficient matrix Confidence interval Gina ama ree pap OO Summary Statistics Produces tables for m Descriptive statistics Displays the sample sizes means and standard deviations of the dependent variable and covariates 1f specified These statistics are displayed for each distinct level combination of the factors m Case Processing Summary Displays the sorted values of the factors the repeated measure variables the repeated
157. is a subject or case The weights for the weeks are recorded in the variables weight weight2 and so on The gender of each person is recorded in another variable The weights measured for each subject repeatedly can be grouped by defining a within subjects factor The factor could be called week defined to have five levels In the main dialog box the variables weight weight5 are used to assign the five levels of 18 Chapter 3 week The variable in the data file that groups males and females gender can be specified as a between subjects factor to study the differences between males and females Measures If subjects were tested on more than one measure at each time define the measures For example the pulse and respiration rate could be measured on each subject every day for a week These measures do not exist as variables in the data file but are defined here A model with more than one measure is sometimes called a doubly multivariate repeated measures model GLM Repeated Measures Model Figure 3 3 Repeated Measures Model dialog box E Repeated Measures Model Bes Model Full factorial Within Subjects Within Subjects Model time Build Term s Between Subjects Main effects Between Subjects Model jul gender gender Hl age age Sum of squares Specify Model A full factorial model contains all factor main effects all covariate main effects and all factor by
158. ith the binomial or multinomial distribution m Power f x x if a 0 f x Hlog x if a 0 a is the required number specification and must be a real number This link can be used with any distribution except the multinomial Fixed Effects Figure 8 4 Fixed Effects settings What are the fixed effects that apply to all subjects Use predefined inputs 8 Use custom inputs PA Fields Effect builder Weight and Offset amp Treatment received y amp Treatment received attert il Week attert Include intercept fal ee Create effects by selecting one or more fields from the source list and dragging them to the effect builder Fixed effects factors are generally thought of as fields whose values of interest are all represented in the dataset and can be used for scoring By default fields with the predefined input role that are not specified elsewhere in the dialog are entered in the fixed effects portion of the model Categorical nominal and ordinal fields are used as factors in the model and continuous fields are used as covariates Enter effects into the model by selecting one or more fields in the source list and dragging to the effects list The type of effect created depends upon which hotspot you drop the selection 101 Generalized linear mixed models m Main Dropped fields appear as separate main effects at the bottom of the effects list m 2 way All possible pairs of the dropped fie
159. l board can use a generalized linear mixed model to determine whether an experimental teaching method is effective at improving math scores Students from the same classroom should be correlated since they are taught by the same teacher and classrooms within the same school may also be correlated so we can include random effects at school and class levels to account for different sources of variability Medical researchers can use a generalized linear mixed model to determine whether a new anticonvulsant drug can reduce a patient s rate of epileptic seizures Repeated measurements from the same patient are typically positively correlated so a mixed model with some random effects should be appropriate The target field the number of seizures takes positive integer values so a generalized linear mixed model with a Poisson distribution and log link may be appropriate Executives at a cable provider of television phone and internet services can use a generalized linear mixed model to know more about potential customers Since possible answers have nominal measurement levels the company analyst uses a generalized logit mixed model with a random intercept to capture correlation between answers to the service usage questions across service types tv phone internet within a given survey responder s answers Copyright IBM Corporation 1989 2012 93 94 Chapter 8 Figure 8 1 Data Structure tab Daute un tr estore How are your da
160. l online Solutions for Education http www ibm com spss rd students pages for students If you re a student using a university supplied copy of the IBM SPSS software please contact the IBM SPSS product coordinator at your university Customer Service If you have any questions concerning your shipment or account contact your local office Please have your serial number ready for identification Copyright IBM Corporation 1989 2012 iii Training Seminars IBM Corp provides both public and onsite training seminars All seminars feature hands on workshops Seminars will be offered in major cities on a regular basis For more information on these seminars go to http www ibm com software analytics spss training Contents 1 Introduction to Advanced Statistics 1 2 GLM Multivariate Analysis 2 GLM Multivariate Model 22 4 Build TERMS a bana dad CA kaw ha DAD Be Ae Cae nbd Dh had anes Hanh oe 4 4 Sum of Squares ete tee ete eens 5 GLM Multivariate Contrasts ce cece tenn nen ens 6 Contrast Types 6 GLM Multivariate Profile Plots 0 0 0 ce ooo 7 GLM Multivariate Post Hoc Comparisons nsanu aana e eee 8 GLM SAV Ge seeing int ie a a eaa adi 10 GLM Multivariate Options onnaa naaa 11 GLM Command Additional Features nnana nananana 12 3 GLM Repeated Measures 14 GLM Repeated Measures Define Factors oooccoccccoco cor 17 GLM Repeated Measures Model 4 000 cec
161. language also allows you to m Specify nested effects in the design using the DESIGN subcommand m Specify tests of effects versus a linear combination of effects or a value using the TEST subcommand Specify multiple contrasts using the CONTRAST subcommand Include user missing values using the MISSING subcommand m Specify EPS criteria using the CRITERIA subcommand 27 GLM Repeated Measures Construct a custom L matrix M matrix or K matrix using the LMATRIX MMATRIX and KMATRIX subcommands For deviation or simple contrasts specify an intermediate reference category using the CONTRAST subcommand Specify metrics for polynomial contrasts using the CONTRAST subcommand Specify error terms for post hoc comparisons using the POSTHOC subcommand Compute estimated marginal means for any factor or factor interaction among the factors in the factor list using the EMMEANS subcommand Specify names for temporary variables using the SAVE subcommand Construct a correlation matrix data file using the OUTFILE subcommand Construct a matrix data file that contains statistics from the between subjects ANOVA table using the OUTFILE subcommand Save the design matrix to a new data file using the OUTFILE subcommand See the Command Syntax Reference for complete syntax information Chapter Variance Components Analysis The Variance Components procedure for mixed effects models estimates the contri
162. lds appear as 2 way interactions at the bottom of the effects list m 3 way All possible triplets of the dropped fields appear as 3 way interactions at the bottom of the effects list m The combination of all dropped fields appear as a single interaction at the bottom of the effects list The buttons to the right of the Effect Builder allow you to Delete terms from the fixed effects model by selecting the terms you want to delete and clicking the delete button Reorder the terms within the fixed effects model by selecting the terms you want to reorder and clicking the up or down arrow and Add nested terms to the model using the Add a Custom Term dialog by clicking on the Add a Custom Term button E Ob Y Include Intercept The intercept is usually included in the model If you can assume the data pass through the origin you can exclude the intercept Add a Custom Term Figure 8 5 Add a Custom Term dialog y Add a Custom Term IN Custom Term Sa patient id gender amp treatment ll week 8 convulsions You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store 102 Chapter 8 locations Since each customer frequents only one of these locations the Cust
163. list If Include constant for dependent is selected there is also a unit term 1 added to the model list For example suppose variables D and D2 are the dependent variables A dependent terms list is created by the Logit Loglinear Analysis procedure D D2 DI D2 If the Terms in Model list contains M and M2 and a constant is included the model list contains 1 M7 and M2 The resultant design includes combinations of each model term with each dependent term DI D2 DI D2 MI DI MI D2 MI DI D2 M2 DI M2 D2 M2 DI D2 Include constant for dependent Includes a constant for the dependent variable in a custom model 136 Chapter 11 Build Terms For the selected factors and covariates Interaction Creates the highest level interaction term of all selected variables This is the default Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Logit Loglinear Analysis Options Figure 11 3 Logit Loglinear Analysis Options dialog box EH Logit Loglinear Analysis Options Display Plot Z Frequencies _ Adjusted residuals Residuals E Design matrix Deviance residu
164. lly independent Copyright IBM Corporation 1989 2012 128 129 General Loglinear Analysis Related procedures Use the Crosstabs procedure to examine the crosstabulations Use the Logit Loglinear procedure when it is natural to regard one or more categorical variables as the response varlables and the others as the explanatory variables Obtaining a General Loglinear Analysis b From the menus choose Analyze gt Loglinear gt General Figure 10 1 General Loglinear Analysis dialog box E General Loglinear Analysis E Factor s Wireless service 4 i TR ah Multiple lines multline e Model Voice mail voice Paging service pag i E Internet internet Cell Covariate s Caller ID callid Call waiting callwait PA Owns TY owntv Cell Structure 2 owne vertonnen CT AA Owns stereoiCD pl Contrast Variable s L Owns PDA ownpda E Owns computer o gt Owns fax machine pas of Cell Counts Poisson O Multinomial Cor easte B888 cancer rie J gt In the General Loglinear Analysis dialog box select up to 10 factor variables Optionally you can m Select cell covariates m Select a cell structure variable to define structural zeros or include an offset term m Select a contrast variable 130 Chapter 10 General Loglinear Analysis Model Figure 10 2 General Loglinear Analysis Model dialog box E General Logl
165. ltinomial ordinal distribution do not have a single intercept term instead there are threshold parameters that define transition points between adjacent categories The thresholds are always included in the model 56 Chapter 6 Generalized Linear Models Estimation Figure 6 7 Generalized Linear Models Estimation tab E Generalized Linear Models Type of Model Response Predictors Model Esimatin Staisics EMMeans Save Export Parameter Estimation Covariance Matrix Method Model based estimator Robust estimator Maximum Fisher Scoring tterations b E Get initial values for parameter estimates Scale Parameter Method Toma daa Value h initial Values piterations Maximum Iterations IE check for separation of data pc Maximum Step Halving Starting Iteration Convergence Criteria At least one convergence criterion must be specified with a minimum greater than O Minimum Type iM Change in parameter estimates 4 E006 LL Change in log likelihood IL Hessian convergence Singularity Tolerance E012 Parameter Estimation The controls in this group allow you to specify estimation methods and to provide initial values for the parameter estimates m Method You can select a parameter estimation method Choose between Newton Raphson Fisher scoring or a hybrid method in which Fisher scoring iterations are performed before switching to the Newt
166. mates standard errors confidence intervals and tests of partial association For custom models plots of residuals and normal probability plots Data Factor variables are categorical All variables to be analyzed must be numeric Categorical string variables can be recoded to numeric variables before starting the model selection analysis Avoid specifying many variables with many levels Such specifications can lead to a situation where many cells have small numbers of observations and the chi square values may not be useful Related procedures The Model Selection procedure can help identify the terms needed in the model Then you can continue to evaluate the model using General Loglinear Analysis or Logit Loglinear Analysis You can use Autorecode to recode string variables If a numeric variable has empty categories use Recode to create consecutive integer values Obtaining a Model Selection Loglinear Analysis From the menus choose Analyze gt Loglinear gt Model Selection Copyright IBM Corporation 1989 2012 124 125 Model Selection Loglinear Analysis Figure 9 1 Model Selection Loglinear Analysis dialog box E Model Selection Logtinear Analysis Factor s 8 Yoice mail voice E Paging service pager E Internet internet Caller ID callid Call waiting callwait PA Owns TY ownty PA Owns YCR ownver Owns stereo CD player Number of cells O PA Owns PDA ownpda 8 Owns computer ownpc gt
167. mbined with rich industry solutions proven practices and professional services organizations of every size can drive the highest productivity confidently automate decisions and deliver better results As part of this portfolio IBM SPSS Predictive Analytics software helps organizations predict future events and proactively act upon that insight to drive better business outcomes Commercial government and academic customers worldwide rely on IBM SPSS technology as a competitive advantage in attracting retaining and growing customers while reducing fraud and mitigating risk By incorporating IBM SPSS software into their daily operations organizations become predictive enterprises able to direct and automate decisions to meet business goals and achieve measurable competitive advantage For further information or to reach a representative visit http www ibm com spss Technical support Technical support is available to maintenance customers Customers may contact Technical Support for assistance in using IBM Corp products or for installation help for one of the supported hardware environments To reach Technical Support see the IBM Corp web site at http www ibm com support Be prepared to identify yourself your organization and your support agreement when requesting assistance Technical Support for Students If you re a student using a student academic or grad pack version of any IBM SPSS software product please see our specia
168. measure subjects and the random effects subjects and their frequencies Model Statistics Produces tables for m Parameter estimates Displays the fixed effects and random effects parameter estimates and their approximate standard errors m Tests for covariance parameters Displays the asymptotic standard errors and Wald tests for the covariance parameters m Correlations of parameter estimates Displays the asymptotic correlation matrix of the fixed effects parameter estimates m Covariances of parameter estimates Displays the asymptotic covariance matrix of the fixed effects parameter estimates Linear Mixed Models m Covariances of random effects Displays the estimated covariance matrix of random effects This option is available only when at least one random effect is specified If a subject variable is specified for a random effect then the common block is displayed m Covariances of residuals Displays the estimated residual covariance matrix This option is available only when a repeated variable has been specified If a subject variable is specified the common block is displayed m Contrast coefficient matrix This option displays the estimable functions used for testing the fixed effects and the custom hypotheses Confidence interval This value is used whenever a confidence interval is constructed Specify a value greater than or equal to 0 and less than 100 The default value is 95 Linear Mixed Models EM Means Fi
169. meter estimates are set at zero and all standard errors significance levels and residual degrees of freedom are set to the system missing value For the scale parameter covariances correlations significance level and degrees of freedom are set to the system missing value If the scale parameter is estimated via maximum likelihood the standard error is given otherwise it is set to the system missing value For the negative binomial parameter covariances correlations significance level and degrees of freedom are set to the system missing value If the negative binomial parameter is estimated via maximum likelihood the standard error is given otherwise it is set to the system missing value If there are splits then the list of parameters must be accumulated across all splits In a given split some parameters may be irrelevant this is not the same as redundant For irrelevant parameters all covariances or correlations parameter estimates standard errors significance levels and degrees of freedom are set to the system missing value You can use this matrix file as the initial values for further model estimation note that this file is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Even then you should take care that all parameters in this matrix file have the same meaning for the procedure reading the file Export model as XML S
170. n Statistics Survival table including time status cumulative survival and standard error cumulative events and number remaining and mean and median survival time with standard error and 95 confidence interval Plots survival hazard log survival and one minus survival Data The time variable should be continuous the status variable can be categorical or continuous and the factor and strata variables should be categorical Assumptions Probabilities for the event of interest should depend only on time after the initial event they are assumed to be stable with respect to absolute time That is cases that enter the study at different times for example patients who begin treatment at different times should behave similarly There should also be no systematic differences between censored and uncensored cases If for example many of the censored cases are patients with more serious conditions your results may be biased Related procedures The Kaplan Meier procedure uses a method of calculating life tables that estimates the survival or hazard function at the time of each event The Life Tables procedure uses an actuarial approach to survival analysis that relies on partitioning the observation period into smaller time intervals and may be useful for dealing with large samples If you have variables that you suspect are related to survival time or variables that you want to control for covariates use the Cox Regression procedure
171. n to other data files for scoring purposes Specify a unique valid filename If the file specification refers to an existing file then the file is overwritten Model view The procedure creates a Model object in the Viewer By activating double clicking this object you gain an interactive view of the model By default the Model Summary view is shown To see another model view select it from the view thumbnails As an alternative to the Model object you can generate pivot tables and charts by selecting Pivot tables and charts in the Output Display group on the Output tab of the Options dialog Edit gt Options The topics that follow describe the Model object 112 Chapter 8 Model Summary Figure 8 12 Model Summary view Model Summary Target Post test Target Post test Probability Distribution Normal Link Function Identity Akaike Corrected 10 793 793 Information Criterion Bayesian 10 810 765 Information criteria are based on the 2 log likelihood 10 787 782 and are used to compare models Models with smaller information criterion values fit better This view is a snapshot at a glance summary of the model and its fit Table The table identifies the target probability distribution and link function specified on the Target settings If the target is defined by events and trials the cell is split to show the events field and the trials field or fixed number of trials Additionally the finite sample c
172. nce as singular Specify a positive value 58 Chapter 6 Generalized Linear Models Initial Values Figure 6 8 Generalized Linear Models Initial Values dialog box EX Generalized Linear Models Initial Values pSource of Initial Values O File File Name If initial values are specified they must be supplied for all parameters including redundant parameters in the model In the dataset the ordering of variables from left to right must be RowType_ VarName_ P1 P2 where RowType_ and VarName_ are string variables and PZ P2 are numeric variables corresponding to an ordered list of the parameters m Initial values are supplied on a record with value EST for variable RowType_ the actual initial values are given under variables P P2 The procedure ignores all records for which RowType_ has a value other than EST as well as any records beyond the first occurrence of RowType_ equal to EST m The intercept if included in the model or threshold parameters if the response has a multinomial distribution must be the first initial values listed m The scale parameter and if the response has a negative binomial distribution the negative binomial parameter must be the last initial values specified m If Split File is in effect then the variables must begin with the split file variable or variables in the order specified when creating the Split File followed by RowType VarName_ P1 P
173. nd the Residual SSCP matrix plus Bartlett s test of sphericity of the residual covariance matrix Homogeneity tests produces the Levene test of the homogeneity of variance for each dependent variable across all level combinations of the between subjects factors for between subjects factors only Also homogeneity tests include Box s M test of the homogeneity of the covariance matrices of the dependent variables across all level combinations of the between subjects factors The spread versus level and residual plots options are useful for checking assumptions about the data This item is disabled if there are no factors Select Residual plots to produce an observed by predicted by standardized residuals plot for each dependent variable These plots are useful for investigating the assumption of equal variance Select Lack of fit test to check if the relationship between the dependent variable and the independent variables can be adequately described by the model General estimable function allows you to construct custom hypothesis tests based on the general estimable function Rows in any contrast coefficient matrix are linear combinations of the general estimable function Significance level You might want to adjust the significance level used in post hoc tests and the confidence level used for constructing confidence intervals The specified value is also used to calculate the observed power for the test When you specify a significance level the asso
174. near Copyright IBM Corporation 1989 2012 34 35 Linear Mixed Models Figure 5 1 Linear Mixed Models Specify Subjects and Repeated Variables dialog box Click Continue for models with uncorrelated terms Specify Subject variable for models with correlated random effects Specify both Repeated and Subject variables for models with correlated residuals within the random effects Subjects Store ID storeid Customer ID custid Health food store hith al Size of store size Store organization orc Gender gender amp Who shopping for sh Vegetarian veg Shopping style style pa Use coupons usecoup amp Sequence seq Carryover carry PA Value of coupon cou Repeated Covariance Type Optionally select one or more subject variables Optionally select one or more repeated variables Optionally select a residual covariance structure v v v y Click Continue Figure 5 2 Linear Mixed Models dialog box Dependent Variable dh Store ID storeid Le amp Amount spent amtspe Health food store hithf dll Size of store size Pare Ea amp Store organization org e aa E Customer ID custid oe ete O Gender gender dl Week week Covariste s amp Sequence seq Carryover carry gt Bootstrap Value of coupon coup FI Residual Weight Select a dependent variable gt Select at least one factor or
175. near mixed models Gamma This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis Inverse Gaussian This distribution is appropriate for a target with positive scale values that are skewed toward larger positive values If a data value is less than or equal to 0 or is missing then the corresponding case is not used in the analysis Multinomial This distribution is appropriate for a target that represents a multi category response The form of the model will depend on the measurement level of the target A nominal target will result in a nominal multinomial model in which a separate set of model parameters are estimated for each category of the target except the reference category The parameter estimates for a given predictor show the relationship between that predictor and the likelihood of each category of the target relative to the reference category An ordinal target will result in an ordinal multinomial model in which the traditional intercept term is replaced with a set of threshold parameters that relate to the cumulative probability of the target categories Negative binomial Negative binomial regression uses a negative binomial distribution with a log link which should be used when the target represents a count of occurrences with high variance Normal
176. near models 49 cumulative probit link function in generalized estimating equations 74 in generalized linear models 49 custom models in GLM Repeated Measures 18 169 170 Index in Model Selection Loglinear Analysis 126 in Variance Components 30 deleted residuals in GLM 10 in GLM Repeated Measures 24 descriptive statistics in Generalized Estimating Equations 84 in Generalized Linear Models 60 in GLM Multivariate 11 in GLM Repeated Measures 25 in Linear Mixed Models 42 deviance residuals in Generalized Linear Models 64 Duncan s multiple range test in GLM Multivariate 8 in GLM Repeated Measures 22 Dunnett s C in GLM Multivariate 8 in GLM Repeated Measures 22 Dunnett s test in GLM Multivariate 8 in GLM Repeated Measures 22 Dunnett s T3 in GLM Multivariate 8 in GLM Repeated Measures 22 effect size estimates in GLM Multivariate 11 in GLM Repeated Measures 25 estimated marginal means in Generalized Estimating Equations 86 in Generalized Linear Models 61 in GLM Multivariate 11 in GLM Repeated Measures 25 in Linear Mixed Models 43 eta squared in GLM Multivariate 11 in GLM Repeated Measures 25 factor level information in Linear Mixed Models 42 factors in GLM Repeated Measures 17 Fisher scoring in Linear Mixed Models 41 Fisher s LSD in GLM Multivariate 8 in GLM Repeated Measures 22 fixed effects in Linear Mixed Models 37 fixed pred
177. ng equations 74 in generalized linear models 49 negative log log link function in generalized estimating equations 74 in generalized linear models 49 nested terms in Generalized Estimating Equations 79 in Generalized Linear Models 54 in Linear Mixed Models 38 Newman Keuls in GLM Multivariate 8 in GLM Repeated Measures 22 Newton Raphson method in General Loglinear Analysis 128 in Logit Loglinear Analysis 133 normal distribution in generalized estimating equations 73 in generalized linear models 48 normal probability plots in Model Selection Loglinear Analysis 127 observed means in GLM Multivariate 11 in GLM Repeated Measures 25 odds power link function in generalized estimating equations 74 in generalized linear models 49 odds ratio in General Loglinear Analysis 128 parameter convergence in Generalized Estimating Equations 81 in Generalized Linear Models 56 in Linear Mixed Models 41 parameter covariance matrix in Linear Mixed Models 42 parameter estimates in General Loglinear Analysis 128 in Generalized Estimating Equations 84 in Generalized Linear Models 60 in GLM Multivariate 11 in GLM Repeated Measures 25 in Linear Mixed Models 42 in Logit Loglinear Analysis 133 in Model Selection Loglinear Analysis 127 Pearson residuals in Generalized Estimating Equations 89 in Generalized Linear Models 64 plots in General Loglinear Analysis 131 in Logit Loglinear Analysis 136 Poisson distr
178. nts the last evaluation of the gradient vector and the Hessian matrix The iteration history table displays parameter estimates for every nt iterations beginning with the 0th iteration the initial estimates where n is the value of the print interval If the iteration history is requested then the last iteration is always displayed regardless of n Working correlation matrix Displays the values of the matrix representing the within subject dependencies Its structure depends upon the specifications in the Repeated tab 86 Chapter 7 Generalized Estimating Equations EM Means Figure 7 11 Generalized Estimating Equations EM Means tab E Generalized Estimating Equations Repeated Type of Model Response Predictors Model Estimation Statistics EM Means Save Export Factors and Interactions Display Means for M Term Reference Category type construction operation Scale Compute means for response Compute means for linear predictor Adjustment for Multiple Comparisons E Display overall estimated mean This tab allows you to display the estimated marginal means for levels of factors and factor interactions You can also request that the overall estimated mean be displayed Estimated marginal means are not available for ordinal multinomial models Factors and Interactions This list contains factors specified on the Predictors tab and factor interactions specified on the Model tab Covariates are excl
179. ny first order interaction effects any first order interaction effects are specified before any second order interaction effects and so on A polynomial regression model in which any lower order terms are specified before any higher order terms m A purely nested model in which the first specified effect is nested within the second specified effect the second specified effect is nested within the third and so on This form of nesting can be specified only by using syntax Type Ill The default This method calculates the sums of squares of an effect in the design as the sums of squares adjusted for any other effects that do not contain it and orthogonal to any effects if any that contain it The Type III sums of squares have one major advantage in that they are invariant with respect to the cell frequencies as long as the general form of estimability remains constant Therefore this type is often considered useful for an unbalanced model with no missing cells In a factorial design with no missing cells this method is equivalent to the Yates weighted squares of means technique The Type III sum of squares method is commonly used for m Any models listed in Type I m Any balanced or unbalanced models with no empty cells 33 Variance Components Analysis Variance Components Save to New File Figure 4 4 Variance Components Save to New File dialog box EF Variance Components Save Variance component estimates
180. od of category 0 or 1 relative to the likelihood of category 2 If you specify a custom category and your target has defined labels you can set the reference category by choosing a value from the list This can be convenient when in the middle of specifying a model you don t remember exactly how a particular field was coded Target Distribution and Relationship Link with the Linear Model Given the values of the predictors the model expects the distribution of values of the target to follow the specified shape and for the target values to be linearly related to the predictors through the specified link function Short cuts for several common models are provided or choose a Custom setting if there is a particular distribution and link function combination you wish to fit that is not on the short list m Linear model Specifies a normal distribution with an identity link which is useful when the target can be predicted using a linear regression or ANOVA model m Gamma regression Specifies a Gamma distribution with a log link which should be used when the target contains all positive values and is skewed towards larger values m Loglinear Specifies a Poisson distribution with a log link which should be used when the target represents a count of occurrences in a fixed period of time m Negative binomial regression Specifies a negative binomial distribution with a log link which should be used when the target and denominator represent the n
181. of square matrices with as many rows and columns as there are categories of the given independent variable For MANOVA and LOGLINEAR the first row entered is always the mean or constant effect and represents the set of weights indicating how to average other independent variables if any over the given variable Generally this contrast is a vector of ones 161 Categorical Variable Coding Schemes The remaining rows of the matrix contain the special contrasts indicating the desired comparisons between categories of the variable Usually orthogonal contrasts are the most useful Orthogonal contrasts are statistically independent and are nonredundant Contrasts are orthogonal if m For each row contrast coefficients sum to O m The products of corresponding coefficients for all pairs of disjoint rows also sum to 0 For example suppose that treatment has four levels and that you want to compare the various levels of treatment with each other An appropriate special contrast is 1 1 1 1 weights for mean calculation 3 1 1 1 compare 15t with 29d through 4th 0 2 1 1 compare 21d with 31d and 4th 0 0 1 1 compare 314 with 4th which you specify by means of the following CONTRAST subcommand for MANOVA LOGISTIC REGRESSION and COXREG CONTRAST TREATMNT SPECIAL 1 1 1 1 3 SL SIL 0 2 1 1 0 0 1 1 For LOGLINEAR you need to specify CONTRAST TREATMNT BASIS SPECIAL 1 1 1 1 Sl A E O 2D ed Os o
182. omer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying A 4 is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Constructing a nested term Select a factor or covariate that is nested within another factor and then click the arrow button Click Within Select the factor within which the previous factor or covariate is nested and then click the arrow button Click Add Term Optionally you can include interaction effects or add multiple levels of nesting to the nested term 103 Generalized linear mixed models Random Effects Figure 8 6 Random Effects settings oa cure Pa Ere us or ue ptr Select an item Target What are the random effects Estimates for terms within blocks can differ for each subject combination or group Fixed Effects Randometiects Random effect blocks Weight and Offset Subject Terms Ga center id none none Pe center id attphys id Random effects factor
183. on Raphson method If convergence is achieved during the Fisher scoring phase of the hybrid method before the maximum number of Fisher iterations is reached the algorithm continues with the Newton Raphson method m Scale parameter method You can select the scale parameter estimation method Maximum likelihood jointly estimates the scale parameter with the model effects note that this option is not valid if the response has a negative binomial Poisson binomial or multinomial distribution The deviance and Pearson chi square options estimate the scale parameter from the value of those statistics Alternatively you can specify a fixed value for the scale parameter b7 Generalized Linear Models m Initial values The procedure will automatically compute initial values for parameters Alternatively you can specify initial values for the parameter estimates m Covariance matrix The model based estimator is the negative of the generalized inverse of the Hessian matrix The robust also called the Huber White sandwich estimator is a corrected model based estimator that provides a consistent estimate of the covariance even when the specification of the variance and link functions is incorrect Iterations m Maximum iterations The maximum number of iterations the algorithm will execute Specify a non negative integer m Maximum step halving At each iteration the step size is reduced by a factor of 0 5 until the log likelihood incre
184. only Export model as data Writes a dataset in IBM SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in the matrix file is as follows m Split variables If used any variables defining splits m RowType_ Takes values and value labels COV covariances CORR correlations EST parameter estimates SE standard errors SIG significance levels and DF sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other row types 91 Generalized Estimating Equations m VarName Takes values P P2 corresponding to an ordered list of all estimated model parameters except the scale or negative binomial parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the Parameter estimates table The cells are blank for other row types m P1 P2 These variables correspond to an ordered list of all model parameters including the scale and negative binomial parameters as appropriate with variable labels corresponding to the parameter strings shown in the Parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correlations are set to the system missing value all para
185. ops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified which must be positive m Log likelihood convergence When selected the algorithm stops after an iteration in which the absolute or relative change in the log likelihood function is less than the value specified which must be positive m Hessian convergence For the Absolute specification convergence is assumed if a statistic based on the Hessian convergence is less than the positive value specified For the Relative specification convergence is assumed if the statistic is less than the product of the positive value specified and the absolute value of the log likelihood Singularity tolerance Singular or non invertible matrices have linearly dependent columns which can cause serious problems for the estimation algorithm Even near singular matrices can lead to poor results so the procedure will treat a matrix whose determinant is less than the tolerance as singular Specify a positive value Generalized Estimating Equations Initial Values The procedure estimates an initial generalized linear model and the estimates from this model are used as initial values for the parameter estimates in the linear model part of the generalized estimating equations Initial values are not needed for the working correlation matrix because matrix elements are based on the parameter estimates Initial values specified on this dialog box
186. oration 1989 2012 Index correlation matrix in Generalized Estimating Equations 84 in Generalized Linear Models 60 in Linear Mixed Models 42 covariance matrix in Generalized Estimating Equations 81 84 in Generalized Linear Models 56 60 in GLM 10 in Linear Mixed Models 42 covariance parameters test in Linear Mixed Models 42 covariance structures 162 in Linear Mixed Models 162 covariates in Cox Regression 149 Cox Regression 148 baseline functions 153 categorical covariates 149 command additional features 153 contrasts 149 covariates 148 define event 153 DfBeta s 152 example 148 hazard function 152 iterations 153 partial residuals 152 plots 151 saving new variables 152 statistics 148 153 stepwise entry and removal 153 string covariates 149 survival function 152 survival status variable 153 time dependent covariates 155 156 cross products hypothesis and error matrices 11 crosstabulation in Model Selection Loglinear Analysis 124 cumulative Cauchit link function in generalized estimating equations 74 in generalized linear models 49 cumulative complementary log log link function in generalized estimating equations 74 in generalized linear models 49 cumulative logit link function in generalized estimating equations 74 in generalized linear models 49 cumulative negative log log link function in generalized estimating equations 74 in generalized li
187. ord events in trials require extra attention 51 Generalized Linear Models m Binary response When the dependent variable takes only two values you can specify the reference category for parameter estimation A binary response variable can be string or numeric m Number of events occurring in a set of trials When the response is a number of events occurring in a set of trials the dependent variable contains the number of events and you can select an additional variable containing the number of trials Alternatively if the number of trials is the same across all subjects then trials may be specified using a fixed value The number of trials should be greater than or equal to the number of events for each case Events should be non negative integers and trials should be positive integers For ordinal multinomial models you can specify the category order of the response ascending descending or data data order means that the first value encountered in the data defines the first category the last value encountered defines the last category Scale Weight The scale parameter is an estimated model parameter related to the variance of the response The scale weights are known values that can vary from observation to observation If the scale weight variable is specified the scale parameter which is related to the variance of the response is divided by it for each observation Cases with scale weight values that are less than or
188. ormation see the topic Covariance Structures in Appendix B on p 162 Obtaining a generalized linear mixed model This feature requires the Advanced Statistics option From the menus choose Analyze 5 Mixed Models 5 Generalized Linear gt Define the subject structure of your dataset on the Data Structure tab b On the Fields and Effects tab there must be a single target which can have any measurement level or an events trials specification in which case the events and trials specifications must be continuous Optionally specify its distribution and link function the fixed effects and any random effects blocks offset or analysis weights b Click Build Options to specify optional build settings b Click Model Options to save scores to the active dataset and export the model to an external file gt Click Run to run the procedure and create the Model objects 96 Chapter 8 Fields with unknown measurement level The Measurement Level alert is displayed when the measurement level for one or more variables fields in the dataset is unknown Since measurement level affects the computation of results for this procedure all variables must have a defined measurement level Figure 8 2 Measurement level alert Proper measurement level is important for this procedure Measurement level is unknown for one or more fields in the dataset These fields can be can be assigned manually or they can be assigned automatically by scannin
189. orrected Akaike information criterion AICC and Bayesian information criterion BIC are displayed Akaike Corrected A measure for selecting and comparing mixed models based on the 2 Restricted log likelihood Smaller values indicate better models The AICC corrects the AIC for small sample sizes As the sample size increases the AICC converges to the AIC Bayesian A measure for selecting and comparing models based on the 2 log likelihood Smaller values indicate better models The BIC also penalizes overparametrized models but more strictly than the AIC Chart If the target is categorical a chart displays the accuracy of the final model which is the percentage of correct classifications 113 Generalized linear mixed models Data Structure Figure 8 13 Data Structure view Data Structure Target Post test Subjects Target School Classroom Post test 60L Data For First Subject 60L Total Number of Levels Only the first 10 records are displayed One or more subject fields were specified but not actually used in the analysis This view provides a summary of the data structure you specified and helps you to check that the subjects and repeated measures have been specified correctly The observed information for the first subject is displayed for each subject field and repeated measures field and the target Additionally the number of levels for each subject field and repeated measures field is displayed
190. ough its very general model formulation Examples A shipping company can use generalized linear models to fit a Poisson regression to damage counts for several types of ships constructed in different time periods and the resulting model can help determine which ship types are most prone to damage A car insurance company can use generalized linear models to fit a gamma regression to damage claims for cars and the resulting model can help determine the factors that contribute the most to claim size Medical researchers can use generalized linear models to fit a complementary log log regression to interval censored survival data to predict the time to recurrence for a medical condition Data The response can be scale counts binary or events in trials Factors are assumed to be categorical The covariates scale weight and offset are assumed to be scale Assumptions Cases are assumed to be independent observations To Obtain a Generalized Linear Model From the menus choose Analyze 5 Generalized Linear Models 5 Generalized Linear Models Copyright IBM Corporation 1989 2012 46 47 vy v v y Figure 6 1 Generalized Linear Models Generalized Linear Models Type of Model tab E Generalized Linear Models E Type of Model Response Predictors Model Estimation Statistics EM Means Save Export Choose one of the model types listed below or specify a custom combination of distribution and link function E Scale Res
191. ovisions are inconsistent with local law INTERNATIONAL BUSINESS MACHINES PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE Some states do not allow disclaimer of express or implied warranties in certain transactions therefore this statement may not apply to you This information could include technical inaccuracies or typographical errors Changes are perlodically made to the information herein these changes will be incorporated in new editions of the publication IBM may make improvements and or changes in the product s and or the program s described in this publication at any time without notice Any references in this information to non IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you Licensees of this program who wish to have information about it for the purpose of enabling 1 the exchange of information between independently created programs and other programs including this one and 11 the mutual use of the information which has been e
192. pdated you can specify the iteration interval at which to update working correlation matrix elements Specifying a value greater than 1 may reduce processing time Convergence criteria These specifications apply to the parameters in the linear model part of the generalized estimating equations while the specification on the Estimation tab applies only to the initial generalized linear model m Parameter convergence When selected the algorithm stops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified which must be positive m Hessian convergence Convergence is assumed if a statistic based on the Hessian is less than the value specified which must be positive 72 Chapter 7 Generalized Estimating Equations Type of Model Figure 7 2 Generalized Estimating Equations Type of Model tab El Generalized Estimating Equations Choose one of the model types listed below or specify a custom combination of distribution and link function 8 Scale Response dl Ordinal Response Linear Ordinal logistic Gamma with log link Ordinal probit HAT counts O Binary Response or Events Trials Data 9 Poisson loglinear Binary logistic Negative binomial with log link O Binary probit Interval censored survival Ho Mixture O Tweedie with log link O Tweedie with identity link XK Custom Custom The Type of Model tab allows you to speci
193. ponse O Linear Gamma with log link dl Ordinal Response Ordinal logistic Ordinal probit IHI counts Poisson loglinear O Negative binomial with log limk O Binary Response or Events Trials Data O Binary logistic Binary probit Interval censored survival ie Mixture O Tweedie with log link O Tweedie with identity link Custom custom Distributior Nortr Parameter Estimate value Link function Identity Specify a distribution and link function see below for details on the various options On the Response tab select a dependent variable On the Predictors tab select factors and covariates for use in predicting the dependent variable On the Model tab specify model effects using the selected factors and covariates The Type of Model tab allows you to specify the distribution and link function for your model providing short cuts for several common models that are categorized by response type Model Types Scale Response m Linear Specifies Normal as the distribution and Identity as the link function m Gamma with log link Specifies Gamma as the distribution and Log as the link function Ordinal Response 48 Chapter 6 m Ordinal logistic Specifies Multinomial ordinal as the distribution and Cumulative logit as the link function m Ordinal probit Specifies Multinomial ordinal as the distribution and Cumulative probit as the link funct
194. ppendix Notices This information was developed for products and services offered worldwide IBM may not offer the products services or features discussed in this document in other countries Consult your local IBM representative for information on the products and services currently available in your area Any reference to an IBM product program or service is not intended to state or imply that only that IBM product program or service may be used Any functionally equivalent product program or service that does not infringe any IBM intellectual property right may be used instead However it is the user s responsibility to evaluate and verify the operation of any non IBM product program or service IBM may have patents or pending patent applications covering subject matter described in this document The furnishing of this document does not grant you any license to these patents You can send license inquiries in writing to IBM Director of Licensing IBM Corporation North Castle Drive Armonk NY 10504 1785 U S A For license inquiries regarding double byte character set DBCS information contact the IBM Intellectual Property Department in your country or send inquiries in writing to Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd 1623 14 Shimotsuruma Yamato shi Kanagawa 242 8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such pr
195. r the model which controls how long the procedure will search for a solution Display baseline function Allows you to display the baseline hazard function and cumulative survival at the mean of the covariates This display is not available if you have specified time dependent covariates Cox Regression Define Event for Status Variable Enter the value or values indicating that the terminal event has occurred You can enter a single value a range of values or a list of values The Range of Values option is available only if your status variable is numeric COXREG Command Additional Features The command syntax language also allows you to m Obtain frequency tables that consider cases lost to follow up as a separate category from censored cases m Select a reference category other than first or last for the deviation simple and indicator contrast methods Specify unequal spacing of categories for the polynomial contrast method Specify additional iteration criteria 154 Chapter 14 Control the treatment of missing values Specify the names for saved variables Write output to an external IBM SPSS Statistics data file Hold data for each split file group in an external scratch file during processing This can help conserve memory resources when running analyses with large datasets This is not available with time dependent covariates See the Command Syntax Reference for complete syntax information Chapter
196. ractions This list contains factors specified on the Predictors tab and factor interactions specified on the Model tab Covariates are excluded from this list Terms can be selected directly from this list or combined into an interaction term using the By button Display Means For Estimated means are computed for the selected factors and factor interactions The contrast determines how hypothesis tests are set up to compare the estimated means The simple contrast requires a reference category or factor level against which the others are compared m Pairwise Pairwise comparisons are computed for all level combinations of the specified or implied factors This is the only available contrast for factor interactions m Simple Compares the mean of each level to the mean of a specified level This type of contrast is useful when there is a control group 62 Chapter 6 Deviation Each level of the factor is compared to the grand mean Deviation contrasts are not orthogonal Difference Compares the mean of each level except the first to the mean of previous levels They are sometimes called reverse Helmert contrasts Helmert Compares the mean of each level of the factor except the last to the mean of subsequent levels Repeated Compares the mean of each level except the last to the mean of the subsequent level Polynomial Compares the linear effect quadratic effect cubic effect and so on The first degree of freedom conta
197. ractions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying A A is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Intercept The intercept is usually included in the model If you can assume the data pass through the origin you can exclude the intercept Models with the mu
198. rademarks of IBM Corporation registered in many jurisdictions worldwide A current list of IBM trademarks is available on the Web at http www ibm com legal copytrade shtml Adobe the Adobe logo PostScript and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and or other countries Intel Intel logo Intel Inside Intel Inside logo Intel Centrino Intel Centrino logo Celeron Intel Xeon Intel SpeedStep Itanium and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries Java and all Java based trademarks and logos are trademarks of Sun Microsystems Inc in the United States other countries or both Linux is a registered trademark of Linus Torvalds in the United States other countries or both Microsoft Windows Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States other countries or both UNIX is a registered trademark of The Open Group in the United States and other countries This product uses WinWrap Basic Copyright 1993 2007 Polar Engineering and Consulting http www winwrap com Other product and service names might be trademarks of IBM or other companies Adobe product screenshot s reprinted with permission from Adobe Systems Incorporated 168 Appendix C Microsoft product screenshot s reprinted with permission from Microso
199. rate and standard error for each time interval for each group median survival time for each group and Wilcoxon Gehan test for comparing survival distributions between groups Plots function plots for survival log survival density hazard rate and one minus survival Data Your time variable should be quantitative Your status variable should be dichotomous or categorical coded as integers with events being coded as a single value or a range of consecutive values Factor variables should be categorical coded as integers Assumptions Probabilities for the event of interest should depend only on time after the initial event they are assumed to be stable with respect to absolute time That is cases that enter the study at different times for example patients who begin treatment at different times should behave similarly There should also be no systematic differences between censored and uncensored cases If for example many of the censored cases are patients with more serious conditions your results may be biased Related procedures The Life Tables procedure uses an actuarial approach to this kind of analysis known generally as Survival Analysis The Kaplan Meier Survival Analysis procedure uses a slightly different method of calculating life tables that does not rely on partitioning the observation period into smaller time intervals This method is recommended if you have a small number of Copyright IBM Corporation 1989 201
200. res 2 38 Linear Mixed Models Random Effects 0 c cece cee eee eee e nee 39 Linear Mixed Models Estimation 2 0 00 c ccc een en eee nent eens 41 Linear Mixed Models Statistics nnna ccc ene nnn tenn ees 42 Linear Mixed Models EM Means 00 cc cece cence ee nee nen eens 43 Linear Mixed Models Save 02 22 ce een ene eee ene nnn 44 MIXED Command Additional Features 0 0 0 cc ccc cee tenet eens 45 6 Generalized Linear Models 46 Generalized Linear Models Response 00000 e eee eee e eee eens 50 Generalized Linear Models Reference Category 000 c cece cece ee eaee 51 Generalized Linear Models Predictors 0 cc ccc ccc ee te eee enee 52 Generalized Linear Models Options 0c cece eee eee 53 Generalized Linear Models Model 0 cece eee ene eens 54 Generalized Linear Models Estimation 0 0 ccc cee ene eens 56 Generalized Linear Models Initial Values 0 000 cee eee eee 58 Generalized Linear Models Statistics 0 0 0 eee nen eens 59 Generalized Linear Models EM Means 0 0 0 0 cece eee ene eens 61 Generalized Linear Models Save 1 0 2 0 ccc en en te een ene nee 63 Generalized Linear Models Export 00 ccc eee eee eee eee 65 GENLIN Command Additional Features 0 0 0 ccc eee nen eens 66 7 Generalized Estimating Equations 68 Generalized Estimating Equations Type of Model 00
201. rictions m All factors within an interaction must be unique Thus if A is a factor then specifying A A is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Sum of Squares For the model you can choose a type of sums of squares Type III is the most commonly used and is the default Type I This method is also known as the hierarchical decomposition of the sum of squares method Each term is adjusted only for the term that precedes it in the model Type I sums of squares are commonly used for m A balanced ANOVA model in which any main effects are specified before any first order interaction effects any first order interaction effects are specified before any second order interaction effects and so on A polynomial regression model in which any lower order terms are specified before any higher order terms m A purely nested model in which the first specified effect is nested within the second specified effect the second specified effect is nested within the third and so on This form of nesting can be specified only by using syntax Type Ill The default This method calculates the sums of squares of an effect in the design as the sums of squares adjusted for any other effects that do not contain it and orthogonal to any effects if
202. rithm Changing the category order can change the values of factor level effects since these parameter estimates are calculated relative to the last level Factors can be sorted in ascending order from lowest to highest value in descending order from highest to lowest value or in data order This means that the first value encountered in the data defines the first category and the last unique value encountered defines the last category Generalized Linear Models Model Figure 6 6 Generalized Linear Models Model tab FF Generalized Linear Models Type of Model Response Predictors Model Estimation Statistics EMMeans Save Export Specify Model Effects Factors and Covariates Model duration treatment Number of Effects in Model 3 Build Nested Term Term IM Include intercept in model Lot ste eset conce Heb Specify Model Effects The default model is intercept only so you must explicitly specify other model effects Alternatively you can build nested or non nested terms Non Nested Terms For the selected factors and covariates Main effects Creates a main effects term for each variable selected Interaction Creates the highest level interaction term for all selected variables 55 Generalized Linear Models Factorial Creates all possible interactions and main effects of the selected variables All 2 way Creates all possible two way inte
203. rporation 1989 2012 155 156 Chapter 15 b Enter an expression for the time dependent covariate Click Model to proceed with your Cox Regression Figure 15 1 Compute Time Dependent Covariate dialog box E Compute Time Dependent Covariate E 3 Time 1 8 Age in years age PA Age category agecat P4 Marital status marital al Social status social Jl Level of education ed P4 Employed employ 8 Gender gender Jl Severity of first crime E Violent first crime viol Posted bail bail Rehabiltated rehab Second arrest arrest2 9 Time to second arrest Jl Severity of second cri P4 Violent second crime L Second conviction co Partial residual for age Le Expression for T COV T age Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic Functions and Special Variables Note Be sure to include the new variable 7 COV as a covariate in your Cox Regression model For more information see the topic Cox Regression Analysis in Chapter 14 on p 148 Cox Regression with Time Dependent Covariates Additional Features The command syntax language also allows you to specify multiple time dependent covariates Other command syntax features are available for Cox Regression with or without time dependent covariates See the Command Syntax Reference for complete syntax information Appendix
204. rvation to observation If the analysis weight field is specified the scale parameter which is related to the variance of the response is divided by the analysis weight values for each observation Records with analysis weight values that are less than or equal to 0 or are missing are not used in the analysis Offset The offset term is a structural predictor Its coefficient is not estimated by the model but is assumed to have the value 1 thus the values of the offset are simply added to the linear predictor of the target This is especially useful in Poisson regression models where each case may have different levels of exposure to the event of interest For example when modeling accident rates for individual drivers there is an important difference between a driver who has been at fault in one accident in three years of experience and a driver who has been at fault in one accident in 25 years The number of accidents can be modeled as a Poisson or negative binomial response with a log link if the natural log of the experience of the driver is included as an offset term Other combinations of distribution and link types would require other transformations of the offset variable 107 Generalized linear mixed models Build Options Figure 8 9 Build Options settings Sorting Order Sorting order for categorical targets Sorting order for categorical predictors Stopping Rules Maximum iterations Post Estim
205. s The measurements on a subject should be a sample from a multivariate normal distribution and the variance covariance matrices are the same across the cells formed by the between subjects effects Certain assumptions are made on the variance covariance matrix of the dependent variables The validity of the F statistic used in the univariate approach can be assured if the variance covariance matrix is circular in form Huynh and Mandeville 1979 To test this assumption Mauchly s test of sphericity can be used which performs a test of sphericity on the variance covariance matrix of an orthonormalized transformed dependent variable Mauchly s test is automatically displayed for a repeated measures analysis For small sample sizes this test is not very powerful For large sample sizes the test may be significant even when the impact of the departure on the results is small If the significance of the test is large the hypothesis of sphericity can be assumed However if the significance is small and the sphericity assumption appears to be violated an adjustment to the numerator and denominator degrees of freedom can be made in order to validate the univariate F statistic Three estimates of this adjustment which is called epsilon are available in the GLM Repeated Measures procedure Both the numerator and denominator degrees of freedom must be multiplied by epsilon and the significance of the F ratio must be evaluated with the new degrees of fre
206. s an associated variable has been removed from the Factors list in the main dialog box Compare Main Effects This option allows you to request pairwise comparisons of levels of selected main effects The Confidence Interval Adjustment allows you to apply an adjustment to the confidence intervals and significance values to account for multiple comparisons The available methods are LSD no adjustment Bonferroni and Sidak Finally for each factor you can select a reference category to which comparisons are made If no reference category is selected all pairwise comparisons will be constructed The options for the reference category are first last or custom in which case you enter the value of the reference category Linear Mixed Models Save Fi gure 5 8 Linear Mixed Models Save dialog box EH Linear Mixed Models Save Fixed Predicted Values __ E Predicted values E Standard errors E Degrees of freedom Predicted Values amp Residuals O Predicted values O Standard errors Degrees of freedom T Residuals 65 ces se This dialog box allows you to save various model results to the working file Fixed Predicted Values Saves variables related to the regression means without the effects Predicted values The regression means without the random effects Standard errors The standard errors of the estimates Degrees of freedom The degrees of freedom associat
207. s are fields whose values in the data file can be considered a random sample from a larger population of values They are useful for explaining excess variability in the target By default if you have selected more than one subject in the Data Structure tab a Random Effect block will be created for each subject beyond the innermost subject For example if you selected School Class and Student as subjects on the Data Structure tab the following random effect blocks are automatically created m Random Effect 1 subject is school with no effects intercept only m Random Effect 2 subject is school class no effects intercept only You can work with random effects blocks in the following ways To add a new block click Add Block This opens the Random Effect Block dialog 104 Chapter 8 To edit an existing block select the block you want to edit and click Edit Block This opens the Random Effect Block dialog gt To delete one or more blocks select the blocks you want to delete and click the delete button Random Effect Block Figure 8 7 Random Effect Block dialog I 3 3 E Include intercept E dt Create effects by selecting one or more fields from the source list and dragging them to the effect builder Define covariance groups by Subject combination Random effect covariance type mao Enter effects into the model by selecting one or more fields in the source list and dragging to t
208. s by one or more factor variables or covariates The factor variables divide the population into groups Using this general linear model procedure you can test null hypotheses about the effects of factor variables on the means of various groupings of a joint distribution of dependent variables You can investigate interactions between factors as well as the effects of individual factors In addition the effects of covariates and covariate interactions with factors can be included For regression analysis the independent predictor variables are specified as covariates Both balanced and unbalanced models can be tested A design is balanced if each cell in the model contains the same number of cases In a multivariate model the sums of squares due to the effects in the model and error sums of squares are in matrix form rather than the scalar form found in univariate analysis These matrices are called SSCP sums of squares and cross products matrices If more than one dependent variable is specified the multivariate analysis of variance using Pillai s trace Wilks lambda Hotelling s trace and Roy s largest root criterion with approximate F statistic are provided as well as the univariate analysis of variance for each dependent variable In addition to testing hypotheses GLM Multivariate produces estimates of parameters Commonly used a priori contrasts are available to perform hypothesis testing Additionally after an overall F test has s
209. s categories Also known as reverse Helmert contrasts m Helmert Each category of the predictor variable except the last category is compared to the average effect of subsequent categories m Repeated Each category of the predictor variable except the first category is compared to the category that precedes it m Polynomial Orthogonal polynomial contrasts Categories are assumed to be equally spaced Polynomial contrasts are available for numeric variables only m Deviation Each category of the predictor variable except the reference category is compared to the overall effect If you select Deviation Simple or Indicator select either First or Last as the reference category Note that the method is not actually changed until you click Change String covariates must be categorical covariates To remove a string variable from the Categorical Covariates list you must remove all terms containing the variable from the Covariates list in the main dialog box 151 Cox Regression Plots Figure 14 3 Cox Regression Plots dialog box FH Cox Regression Plots Cox Regression Analysis Plot Type iZ Survival 2 Hazard F One minus survival E Log minus log Covariate Values Plotted at gender Cat Mean retire Cat Mean employ Mean address Mean reside Mean ed Cat Mean age Mean Change Value Mean O Yalue change Gi as se marital Cat Mean Le Separ
210. s not available for other response distributions Predicted value of linear predictor Saves model predicted values for each case in the metric of the linear predictor transformed response via the specified link function When the response distribution 1s multinomial the procedure saves the predicted value for each category of the response except the last up to the number of specified categories to save Estimated standard error of predicted value of linear predictor When the response distribution is multinomial the procedure saves the estimated standard error for each category of the response except the last up to the number of specified categories to save The following items are not available when the response distribution is multinomial Cook s distance A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients A large Cook s D indicates that excluding a case from computation of the regression statistics changes the coefficients substantially Leverage value Measures the influence of a point on the fit of the regression The centered leverage ranges from 0 no influence on the fit to N 1 N Raw residual The difference between an observed value and the value predicted by the model Pearson residual The square root of the contribution of a case to the Pearson chi square statistic with the sign of the raw residual Standardized Pearson
211. s the default Type I This method is also known as the hierarchical decomposition of the sum of squares method Each term is adjusted for only the term that precedes it in the model Type I sums of squares are commonly used for m A balanced ANOVA model in which any main effects are specified before any first order interaction effects any first order interaction effects are specified before any second order interaction effects and so on A polynomial regression model in which any lower order terms are specified before any higher order terms m A purely nested model in which the first specified effect is nested within the second specified effect the second specified effect is nested within the third and so on This form of nesting can be specified only by using syntax Type Il This method calculates the sums of squares of an effect in the model adjusted for all other appropriate effects An appropriate effect is one that corresponds to all effects that do not contain the effect being examined The Type II sum of squares method is commonly used for m A balanced ANOVA model m Any model that has main factor effects only m Any regression model m A purely nested design This form of nesting can be specified by using syntax Type Ill The default This method calculates the sums of squares of an effect in the design as the sums of squares adjusted for any other effects that do not contain it and orthogonal to any effects
212. sed on the Studentized maximum modulus Games Howell pairwise comparison test sometimes liberal or Dunnett s C pairwise comparison test based on the Studentized range Duncan s multiple range test Student Newman Keuls S N K and Tukey s b are range tests that rank group means and compute a range value These tests are not used as frequently as the tests previously discussed The Waller Duncan t test uses a Bayesian approach This range test uses the harmonic mean of the sample size when the sample sizes are unequal The significance level of the Scheff test is designed to allow all possible linear combinations of group means to be tested not just pairwise comparisons available in this feature The result is that the Scheffe test is often more conservative than other tests which means that a larger difference between means is required for significance The least significant difference LSD pairwise multiple comparison test is equivalent to multiple individual tests between all pairs of groups The disadvantage of this test is that no attempt is made to adjust the observed significance level for multiple comparisons Tests displayed Pairwise comparisons are provided for LSD Sidak Bonferroni Games Howell Tamhane s T2 and T3 Dunnett s C and Dunnett s T3 Homogeneous subsets for range tests are provided for S N K Tukey s b Duncan R E G W F R E G W O and Waller Tukey s honestly significant difference test Hochberg
213. see that the off diagonal parameters are not significant you may be able to use a simpler covariance structure Effects If there are random effect blocks then there is an Effect dropdown list for selecting the residual or random effect block to display The residual effect is always available Groups If a residual or random effect block has a group specification then there is a Group dropdown list for selecting the group level to display Multinomial If the multinomial distribution is in effect then the Multinomial drop down list controls which target category to display The sort order of the values in the list is determined by the specification on the Build Options settings Estimated Means Significant Effects These are charts displayed for the 10 most significant fixed all factor effects starting with the three way interactions then the two way interactions and finally main effects The chart displays the model estimated value of the target on the vertical axis for each value of the main effect or first listed effect in an interaction on the horizontal axis a separate line is produced for each value of the second listed effect in an interaction a separate chart is produced for each value of the third listed effect in a three way interaction all other predictors are held constant It provides a useful visualization of the effects of each predictor s coefficients on the target Note that if no predictors are significant no
214. sification of the grouping fields A different set of grouping fields can be specified for each random effect block All subjects have the same covariance type subjects within the same covariance grouping will have the same values for the parameters Subject combination This allows you to specify random effect subjects from preset combinations of subjects from the Data Structure tab For example if School Class and Student are defined as subjects on the Data Structure tab and in that order then the Subject combination dropdown list will have None School School Class and School Class Student as options Random effect covariance type This specifies the covariance structure for the residuals The available structures are First order autoregressive AR1 Autoregressive moving average 1 1 ARMA11 Compound symmetry Diagonal Scaled identity Toeplitz Unstructured Variance components For more information see the topic Covariance Structures in Appendix B on p 162 106 Chapter 8 Weight and Offset Figure 8 8 Weight and Offset settings Select an item Analysis weight none 8 Fixed Effects Random Effects 9 This model does not need to be offset Use offset value ai Use offset field a Analysis weight The scale parameter is an estimated model parameter related to the variance of the response The analysis weights are known values that can vary from obse
215. stimated marginal means 109 Generalized linear mixed models m Contrast Type This specifies the type of contrast to use for the levels of the contrast field If None is selected no contrasts are produced Pairwise produces pairwise comparisons for all level combinations of the specified factors This is the only available contrast for factor interactions Deviation contrasts compare each level of the factor to the grand mean Simple contrasts compare each level of the factor except the last to the last level The last level is determined by the sort order for factors specified on the Build Options Note that all of these contrast types are not orthogonal Contrast Field This specifies a factor the levels of which are compared using the selected contrast type If None is selected as the contrast type no contrast field can or need be selected Continuous Fields The listed continuous fields are extracted from the terms in the Fixed Effects that use continuous fields When computing estimated marginal means covariates are fixed at the specified values Select the mean or specify a custom value Display estimated means in terms of This specifies whether to compute estimated marginal means based on the original scale of the target or based on the link function transformation Original target scale computes estimated marginal means for the target Note that when the target is specified using the events trials option this gives
216. stricted maximum likelihood Random Effect Priors Criteria 9 Sum of squares O Type O Type Display Sums of squares al Fi Expected mean squares EEN Method You can choose one of four methods to estimate the variance components MINQUE minimum norm quadratic unbiased estimator produces estimates that are invariant with respect to the fixed effects If the data are normally distributed and the estimates are correct this method produces the least variance among all unbiased estimators You can choose a method for random effect prior weights ANOVA analysis of variance computes unbiased estimates using either the Type I or Type II sums of squares for each effect The ANOVA method sometimes produces negative variance estimates which can indicate an incorrect model an inappropriate estimation method or a need for more data Maximum likelihood ML produces estimates that would be most consistent with the data actually observed using iterations These estimates can be biased This method is asymptotically normal ML and REML estimates are invariant under translation This method does not take into account the degrees of freedom used to estimate the fixed effects Restricted maximum likelihood REML estimates reduce the ANOVA estimates for many if not all cases of balanced data Because this method is adjusted for the fixed effects it should have smaller standard errors than the ML me
217. sts Compares categories of an independent variable with the mean of the previous categories of the variable The general matrix form is mean Uk Uk Uk pi Uk df 1 Cel 1 0 ds 0 df 2 1 2 1 2 1 N 0 df k 1 1 k 1 1 k 1 1 k 1 NG 1 where k is the number of categories for the independent variable For example the difference contrasts for an independent variable with four categories are as follows 1 4 1 4 1 4 1 4 1 1 0 0 212 1 2 1 0 1 3 1 3 1 3 1 Polynomial Orthogonal polynomial contrasts The first degree of freedom contains the linear effect across all categories the second degree of freedom the quadratic effect the third degree of freedom the cubic and so on for the higher order effects You can specify the spacing between levels of the treatment measured by the given categorical variable Equal spacing which is the default if you omit the metric can be specified as consecutive integers from 1 to k where k is the number of categories If the variable drug has three categories the subcommand CONTRAST DRUG POLYNOMIAL is the same as CONTRAST DRUG POLYNOMIAL 1 2 3 Equal spacing is not always necessary however For example suppose that drug represents different dosages of a drug given to three groups If the dosage administered to the second group is twice that given to the first group and the dosage administered to the third group is three times 160 Appendix A that
218. ta structured This procedure assumes that multiple records represent repeated measurements for a single subject Patient ID Center ID Fields Canvas Repeated ne Measures di Center size Date of birth Treatment received amp afert Patient ID wle ale More Define covariance groups by Repeated covariance type First order autoregressive AR1 Y The Data Structure tab allows you to specify the structural relationships between records in your dataset when observations are correlated If the records in the dataset represent independent observations you do not need to specify anything on this tab Subjects The combination of values of the specified categorical fields should uniquely define subjects within the dataset For example a single Patient ID field should be sufficient to define subjects in a single hospital but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals In a repeated measures setting multiple observations are recorded for each subject so each subject may occupy multiple records in the dataset A subject is an observational unit that can be considered independent of other subjects For example the blood pressure readings from a patient in a medical study can be considered independent of the readings from other patients Defining subjects becomes particularly important when there are repeate
219. tal but the combination of Hospital ID and Patient ID may be necessary if patient identification numbers are not unique across hospitals In a repeated measures setting multiple observations are recorded for each subject so each subject may occupy multiple cases in the dataset On the Type of Model tab specify a distribution and link function On the Response tab select a dependent variable On the Predictors tab select factors and covariates for use in predicting the dependent variable On the Model tab specify model effects using the selected factors and covariates 70 Chapter 7 Optionally on the Repeated tab you can specify Within subject variables The combination of values of the within subject variables defines the ordering of measurements within subjects thus the combination of within subject and subject variables uniquely defines each measurement For example the combination of Period Hospital ID and Patient ID defines for each case a particular office visit for a particular patient within a particular hospital If the dataset is already sorted so that each subject s repeated measurements occur in a contiguous block of cases and in the proper order it is not strictly necessary to specify a within subjects variable and you can deselect Sort cases by subject and within subject variables and save the processing time required to perform the temporary sort Generally it s a good idea to make use of within su
220. the estimated marginal means for the events trials proportion rather than for the number of events Link function transformation computes estimated marginal means for the linear predictor Adjust for multiple comparisons using When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values m Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level m Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level The least significant difference method is less conservative than the sequential Sidak method which in turn is less conservative than the sequential Bonferroni that is least significant difference will reject at least as many individual hypotheses as sequential Sidak which in turn will reject at least as many individual hypotheses as sequential Bonferroni 110 Chapter 8 Save Figure 8 11 Save settings
221. thod This method takes into account the degrees of freedom used to estimate the fixed effects Random Effect Priors Uniform implies that all random effects and the residual term have an equal impact on the observations The Zero scheme is equivalent to assuming zero random effect variances Available only for the MINQUE method Sum of Squares Type sums of squares are used for the hierarchical model which is often used in variance component literature If you choose Type Ill the default in GLM the variance estimates can be used in GLM Univariate for hypothesis testing with Type III sums of squares Available only for the ANOVA method Criteria You can specify the convergence criterion and the maximum number of iterations Available only for the ML or REML methods 32 Chapter 4 Display For the ANOVA method you can choose to display sums of squares and expected mean squares If you selected Maximum likelihood or Restricted maximum likelihood you can display a history of the iterations Sum of Squares Variance Components For the model you can choose a type of sum of squares Type III is the most commonly used and is the default Type I This method is also known as the hierarchical decomposition of the sum of squares method Each term is adjusted for only the term that precedes it in the model The Type I sum of squares method is commonly used for m A balanced ANOVA model in which any main effects are specified before a
222. tics are calculated using a fixed effects model Estimated Marginal Means Select the factors and interactions for which you want estimates of the population marginal means in the cells These means are adjusted for the covariates if any Interactions are available only if you have specified a custom model m Compare main effects Provides uncorrected pairwise comparisons among estimated marginal means for any main effect in the model for both between and within subjects factors This item is available only if main effects are selected under the Display Means For list m Confidence interval adjustment Select least significant difference LSD Bonferroni or Sidak adjustment to the confidence intervals and significance This item is available only if Compare main effects 1s selected Display Select Descriptive statistics to produce observed means standard deviations and counts for all of the dependent variables in all cells Estimates of effect size gives a partial eta squared value for each effect and each parameter estimate The eta squared statistic describes the proportion of total variability attributable to a factor Select Observed power to obtain the power of the test when the alternative hypothesis is set based on the observed value Select Parameter estimates to produce the parameter estimates standard errors f tests confidence intervals and the observed power for each test You can display the hypothesis and error SSCP matrices a
223. tor finds out that the variance in weight gain is attributable to the difference in litters much more than to the difference in pigs within a litter Data The dependent variable is quantitative Factors are categorical They can have numeric values or string values of up to eight bytes At least one of the factors must be random That is the levels of the factor must be a random sample of possible levels Covariates are quantitative variables that are related to the dependent variable Assumptions All methods assume that model parameters of a random effect have zero means and finite constant variances and are mutually uncorrelated Model parameters from different random effects are also uncorrelated The residual term also has a zero mean and finite constant variance It is uncorrelated with model parameters of any random effect Residual terms from different observations are assumed to be uncorrelated Based on these assumptions observations from the same level of a random factor are correlated This fact distinguishes a variance component model from a general linear model ANOVA and MINQUE do not require normality assumptions They are both robust to moderate departures from the normality assumption ML and REML require the model parameter and the residual term to be normally distributed Related procedures Use the Explore procedure to examine the data before doing variance components analysis For hypothesis testing use GLM Univariate GL
224. tor levels for each stratum If you do not have a stratification variable the tests are not performed Pairwise for each stratum Compares each distinct pair of factor levels for each stratum Pairwise trend tests are not available If you do not have a stratification variable the tests are not performed Linear trend for factor levels Allows you to test for a linear trend across levels of the factor This option is available only for overall rather than pairwise comparisons of factor levels 146 Chapter 13 Kaplan Meier Save New Variables Figure 13 4 Kaplan Meier Save New Variables dialog box EH Kaplan Meier Save New Variables E Survival E Standard error of survival Hazard E Cumulative events EEN You can save information from your Kaplan Meier table as new variables which can then be used in subsequent analyses to test hypotheses or check assumptions You can save survival standard error of survival hazard and cumulative events as new variables m Survival Cumulative survival probability estimate The default variable name is the prefix sur_ with a sequential number appended to it For example if sur_1 already exists Kaplan Meier assigns the variable name sur_2 m Standard error of survival Standard error of the cumulative survival estimate The default variable name is the prefix se with a sequential number appended to it For example if se 1 already exists Kaplan Meier assigns t
225. ts you can choose whether the reference category is the last or first category Contrast Types Deviation Compares the mean of each level except a reference category to the mean of all of the levels grand mean The levels of the factor can be in any order Simple Compares the mean of each level to the mean of a specified level This type of contrast is useful when there is a control group You can choose the first or last category as the reference 7 GLM Multivariate Analysis Difference Compares the mean of each level except the first to the mean of previous levels Sometimes called reverse Helmert contrasts Helmert Compares the mean of each level of the factor except the last to the mean of subsequent levels Repeated Compares the mean of each level except the last to the mean of the subsequent level Polynomial Compares the linear effect quadratic effect cubic effect and so on The first degree of freedom contains the linear effect across all categories the second degree of freedom the quadratic effect and so on These contrasts are often used to estimate polynomial trends GLM Multivariate Profile Plots Figure 2 4 Multivariate Profile Plots dialog box EE Multivariate Profile Plots Factors Ly Horizontal Axis Ly Separate Lines gt Separate Plots Plots continue canca J Hah Profile plots interaction plots are useful for comparing marginal means in your mod
226. u can exclude the intercept Sum of Squares The method of calculating the sums of squares For models with no missing cells the Type III method is most commonly used Build Non Nested Terms For the selected factors and covariates Factorial Creates all possible interactions and main effects of the selected variables This is the default Interaction Creates the highest level interaction term of all selected variables Main Effects Creates a main effects term for each variable selected All 2 Way Creates all possible two way interactions of the selected variables All 3 Way Creates all possible three way interactions of the selected variables All 4 Way Creates all possible four way interactions of the selected variables 38 Chapter 5 All 5 Way Creates all possible five way interactions of the selected variables Build Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending of their customers at several store locations Since each customer frequents only one of those locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects or add multiple levels of nesting to the nested term Limitations Nested terms have the following rest
227. uded from this list Terms can be selected directly from this list or combined into an interaction term using the By button Display Means For Estimated means are computed for the selected factors and factor interactions The contrast determines how hypothesis tests are set up to compare the estimated means The simple contrast requires a reference category or factor level against which the others are compared m Pairwise Pairwise comparisons are computed for all level combinations of the specified or implied factors This is the only available contrast for factor interactions 87 Generalized Estimating Equations Simple Compares the mean of each level to the mean of a specified level This type of contrast is useful when there is a control group Deviation Each level of the factor is compared to the grand mean Deviation contrasts are not orthogonal Difference Compares the mean of each level except the first to the mean of previous levels They are sometimes called reverse Helmert contrasts Helmert Compares the mean of each level of the factor except the last to the mean of subsequent levels Repeated Compares the mean of each level except the last to the mean of the subsequent level Polynomial Compares the linear effect quadratic effect cubic effect and so on The first degree of freedom contains the linear effect across all categories the second degree of freedom the quadratic effect and so on These contr
228. umber of trials required to observe k successes Multinomial logistic regression Specifies a multinomial distribution which should be used when the target is a multi category response It uses either a cumulative logit link ordinal outcomes or a generalized logit link multi category nominal responses m Binary logistic regression Specifies a binomial distribution with a logit link which should be used when the target is a binary response predicted by a logistic regression model m Binary probit Specifies a binomial distribution with a probit link which should be used when the target is a binary response with an underlying normal distribution m Interval censored survival Specifies a binomial distribution with a complementary log log link which is useful in survival analysis when some observations have no termination event Distribution This selection specifies the distribution of the target The ability to specify a non normal distribution and non identity link function is the essential improvement of the generalized linear mixed model over the linear mixed model There are many possible distribution link function combinations and several may be appropriate for any given dataset so your choice can be guided by a priori theoretical considerations or which combination seems to fit best m Binomial This distribution is appropriate only for a target that represents a binary response or number of events 99 Generalized li
229. ups The students are each given four trials on a learning task and the number of errors for each trial is recorded The errors for each trial are recorded in separate variables and a within subjects factor trial is defined with four levels for the four trials The trial effect is found to be significant while the trial by anxiety interaction is not significant Methods Type I Type II Type III and Type IV sums of squares can be used to evaluate different hypotheses Type III is the default Copyright IBM Corporation 1989 2012 14 15 GLM Repeated Measures Statistics Post hoc range tests and multiple comparisons for between subjects factors least significant difference Bonferroni Sidak Scheff Ryan Einot Gabriel Welsch multiple F Ryan Einot Gabriel Welsch multiple range Student Newman Keuls Tukey s honestly significant difference Tukey s b Duncan Hochberg s GT2 Gabriel Waller Duncan test Dunnett one sided and two sided Tamhane s T2 Dunnett s T3 Games Howell and Dunnett s C Descriptive statistics observed means standard deviations and counts for all of the dependent variables in all cells the Levene test for homogeneity of variance Box s M and Mauchly s test of sphericity Plots Spread versus level residual and profile interaction Data The dependent variables should be quantitative Between subjects factors divide the sample into discrete subgroups such as male and female T
230. used for m Any models listed in Type I and Type II m Any balanced model or unbalanced model with empty cells GLM Multivariate Contrasts Figure 2 3 Multivariate Contrasts dialog box E Multivariate Contrasts Factors clotsolw Simple first proc None Change Contrast Reference Category O Last O First EEN Contrasts are used to test whether the levels of an effect are significantly different from one another You can specify a contrast for each factor in the model Contrasts represent linear combinations of the parameters Hypothesis testing is based on the null hypothesis LBM 0 where L is the contrast coefficients matrix M is the identity matrix which has dimension equal to the number of dependent variables and B is the parameter vector When a contrast is specified an L matrix is created such that the columns corresponding to the factor match the contrast The remaining columns are adjusted so that the L matrix is estimable In addition to the univariate test using F statistics and the Bonferroni type simultaneous confidence intervals based on Student s distribution for the contrast differences across all dependent variables the multivariate tests using Pillai s trace Wilks lambda Hotelling s trace and Roy s largest root criteria are provided Available contrasts are deviation simple difference Helmert repeated and polynomial For deviation contrasts and simple contras
231. variable Actuarial tables for the survival variable are generated for each category of the factor variable You can also select a second order by factor variable Actuarial tables for the survival variable are generated for every combination of the first and second order factor variables 141 Life Tables Life Tables Define Events for Status Variables Figure 12 2 Life Tables Define Event for Status Variable dialog box El Life Tables Define Event for Status Variable Value s Indicating Event Has Occurred Single value fs Range of values through Gs as La Occurrences of the selected value or values for the status variable indicate that the terminal event has occurred for those cases All other cases are considered to be censored Enter either a single value or a range of values that identifies the event of interest Life Tables Define Range Figure 12 3 Life Tables Define Range dialog box EX Life Tables Define Range for Factor Minimum Maximum la es Cases with values for the factor variable in the range you specify will be included in the analysis and separate tables and plots if requested will be generated for each unique value in the range Life Tables Options Figure 12 4 Life Tables Options dialog box El Life Tables Options Life table s Plot Fi Survival E Log survival E Hazard E Density E One minus survival rCompare Levels of First Factor
232. ve 0 0 0c cece teeta 132 GENLOG Command Additional Features 0000000 cece eee eee 132 11 Logit Loglinear Analysis 133 Logit Loglinear Analysis Model oooocococcccococo 135 B ild TES mh a A ee a ele NG 136 Logit Loglinear Analysis Options 0000 cece eee eee 136 Logit Loglinear Analysis Save 0 2 0 ccc cette 137 GENLOG Command Additional Features 0 00 c cee eee 137 12 Life Tables 139 Life Tables Define Events for Status Variables 0 0 0 0 0 ccc ccc cee eee eens 141 Life Tables Define Range 0 0 0 0 a 141 Life Tables Options teen teen eee 141 SURVIVAL Command Additional Features 0 0 0 a 142 13 Kaplan Meier Survival Analysis 143 Kaplan Meier Define Event for Status Variable 000 ccc cece ee eee 144 Kaplan Meier Compare Factor Levels 4 000 c cece teen eens 145 Kaplan Meier Save New Variables 000 000 cee ence eee eens 146 Kaplan Meier Options 0 000 ccc teen ent eee eae 146 KM Command Additional Features ooocococcccococ eee 147 viii 14 Cox Regression Analysis 148 Cox Regression Define Categorical Variables 0c cece eee eee 149 Cox Regression Plots 00 0 cece teen etna 151 Cox Regression Save New Variables 000 cece cece teeta 152 Cox Regression Options 00 0 153 Cox Regression Define Event for Status Variable 00000 cece eee eee 153 COXREG Comm
233. vel combinations of the between subjects factors for between subjects factors only Also homogeneity tests include Box s M test of the homogeneity of the covariance matrices of the dependent variables across all level combinations of the between subjects factors The spread versus level and residual plots options are useful for checking assumptions about the data This item is disabled if there are no factors Select Residual plots to produce an observed by predicted by standardized residuals plot for each dependent variable These plots are useful for investigating the assumption of equal variance Select Lack of fit test to check if the relationship between the dependent variable and the independent variables can be adequately described by the model General estimable function allows you to construct custom hypothesis tests based on the general estimable function Rows in any contrast coefficient matrix are linear combinations of the general estimable function Significance level You might want to adjust the significance level used in post hoc tests and the confidence level used for constructing confidence intervals The specified value is also used to calculate the observed power for the test When you specify a significance level the associated level of the confidence intervals is displayed in the dialog box GLM Command Additional Features These features may apply to univariate multivariate or repeated measures analysis The command syntax
234. vel of the target restricts which distributions and link functions are appropriate m Use number of trials as denominator When the target response is a number of events occurring in a set of trials the target field contains the number of events and you can select an additional field containing the number of trials For example when testing a new pesticide you might expose samples of ants to different concentrations of the pesticide and then record the number of ants killed and the number of ants in each sample In this case the field recording the number of ants killed should be specified as the target events field and the field recording the number of ants in each sample should be specified as the trials field If the number of ants is the same for each sample then the number of trials may be specified using a fixed value 98 Chapter 8 The number of trials should be greater than or equal to the number of events for each record Events should be non negative integers and trials should be positive integers m Customize reference category For a categorical target you can choose the reference category This can affect certain output such as parameter estimates but it should not change the model fit For example if your target takes values 0 1 and 2 by default the procedure makes the last highest valued category or 2 the reference category In this situation parameter estimates should be interpreted as relating to the likeliho
235. ver who has been at fault in one accident in three years of experience and a driver who has been at fault in one accident in 25 years The number of accidents can be modeled as a Poisson or negative binomial response with a log link if the natural log of the experience of the driver is included as an offset term Other combinations of distribution and link types would require other transformations of the offset variable Generalized Linear Models Options Figure 6 5 Generalized Linear Models Options dialog box EE Generalized Linear Models Options User Missing Values Specify how to treat cases with user missing values on factors 9 Exclude Include Cases with user missing values on the dependent variable covariates scale weight variable or offset variable are always excluded gt Category Order for Factors Ascending Descending Use data order The last unique category may be associated with a redundant parameter in the estimation algorithm Ei Gs These options are applied to all factors specified on the Predictors tab User Missing Values Factors must have valid values for a case to be included in the analysis These controls allow you to decide whether user missing values are treated as valid among factor variables 54 Chapter 6 Category Order This is relevant for determining a factor s last level which may be associated with a redundant parameter in the estimation algo
236. viance residuals l Predicted values EEN Select the values you want to save as new variables in the active dataset The suffix n in the new variable names increments to make a unique name for each saved variable The saved values refer to the aggregated data cells in the contingency table even 1f the data are recorded in individual observations in the Data Editor If you save residuals or predicted values for unaggregated data the saved value for a cell in the contingency table is entered in the Data Editor for each case in that cell To make sense of the saved values you should aggregate the data to obtain the cell counts Four types of residuals can be saved raw standardized adjusted and deviance The predicted values can also be saved m Residuals Also called the simple or raw residual it is the difference between the observed cell count and its expected count m Standardized residuals The residual divided by an estimate of its standard error Standardized residuals are also known as Pearson residuals m Adjusted residuals The standardized residual divided by its estimated standard error Since the adjusted residuals are asymptotically standard normal when the selected model is correct they are preferred over the standardized residuals for checking for normality m Deviance residuals The signed square root of an individual contribution to the likelihood ratio chi square statistic G squared where the sign is the s
237. with the overall test results is displayed for interactions there is a separate overall test for each level combination of the effects other than the contrast field Confidence This toggles the display of upper and lower confidence limits for the marginal means using the confidence level specified as part of the Build Options Layout This toggles the layout of the pairwise contrasts diagram The circle layout is less revealing of contrasts than the network layout but avoids overlapping lines Chapter Model Selection Loglinear Analysis The Model Selection Loglinear Analysis procedure analyzes multiway crosstabulations contingency tables It fits hierarchical loglinear models to multidimensional crosstabulations using an iterative proportional fitting algorithm This procedure helps you find out which categorical variables are associated To build models forced entry and backward elimination methods are available For saturated models you can request parameter estimates and tests of partial association A saturated model adds 0 5 to all cells Example In a study of user preference for one of two laundry detergents researchers counted people in each group combining various categories of water softness soft medium or hard previous use of one of the brands and washing temperature cold or hot They found how temperature is related to water softness and also to brand preference Statistics Frequencies residuals parameter esti
238. xchanged should contact IBM Software Group Attention Licensing 233 S Wacker Dr Chicago IL 60606 USA Copyright IBM Corporation 1989 2012 166 167 Notices Such information may be available subject to appropriate terms and conditions including in some cases payment of a fee The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement IBM International Program License Agreement or any equivalent agreement between us Information concerning non IBM products was obtained from the suppliers of those products their published announcements or other publicly available sources IBM has not tested those products and cannot confirm the accuracy of performance compatibility or any other claims related to non IBM products Questions on the capabilities of non IBM products should be addressed to the suppliers of those products This information contains examples of data and reports used in daily business operations To illustrate them as completely as possible the examples include the names of individuals companies brands and products All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental If you are viewing this information softcopy the photographs and color illustrations may not appear Trademarks IBM the IBM logo ibm com and SPSS are t
239. xed Models The Linear Mixed Models procedure expands the general linear model so that the data are permitted to exhibit correlated and nonconstant variability The mixed linear model therefore provides the flexibility of modeling not only the means of the data but their variances and covariances as well The Linear Mixed Models procedure is also a flexible tool for fitting other models that can be formulated as mixed linear models Such models include multilevel models hierarchical linear models and random coefficient models Example A grocery store chain is interested in the effects of various coupons on customer spending Taking a random sample of their regular customers they follow the spending of each customer for 10 weeks In each week a different coupon is mailed to the customers Linear Mixed Models is used to estimate the effect of different coupons on spending while adjusting for correlation due to repeated observations on each subject over the 10 weeks Methods Maximum likelihood ML and restricted maximum likelihood REML estimation Statistics Descriptive statistics sample sizes means and standard deviations of the dependent variable and covariates for each distinct level combination of the factors Factor level information sorted values of the levels of each factor and their frequencies Also parameter estimates and confidence intervals for fixed effects and Wald tests and confidence intervals for parameters of covarian
240. ximum modulus and is generally more powerful than Hochberg s GT2 when the cell sizes are unequal Gabriel s test may become liberal when the cell sizes vary greatly Dunnett s pairwise multiple comparison t test compares a set of treatments against a single control mean The last category is the default control category Alternatively you can choose the first category You can also choose a two sided or one sided test To test that the mean at any level except the control category of the factor is not equal to that of the control category use a two sided test To test whether the mean at any level of the factor is smaller than that of the control category select lt Control Likewise to test whether the mean at any level of the factor is larger than that of the control category select gt Control Ryan Einot Gabriel and Welsch R E G W developed two multiple step down range tests Multiple step down procedures first test whether all means are equal If all means are not equal subsets of means are tested for equality R E G W F is based on an F test and R E G W Q is based on the Studentized range These tests are more powerful than Duncan s multiple range test and Student Newman Keuls which are also multiple step down procedures but they are not recommended for unequal cell sizes When the variances are unequal use Tamhane s T2 conservative pairwise comparisons test based on a f test Dunnett s T3 pairwise comparison test ba

IBM SPSS Advanced Statistics 21

Contents

Download Pdf Manuals

Related Search

Related Contents