Home

IBM Water System SPSS COMPLEX SAMPLES 19 User's Manual

1. Unit_No_ Joint_Prob_ Joint_Prob_ Joint_Prob_ Joint_Prob_ Joint_Prob 10 1 sat 10 i 1 11 2 10 39 16 16 1 9 3 M3 15 44 21 1 12 4 12 16 21 4B 2 12 1 22 04 07 D6 2 6 2 04 23 07 D6 2 7 3 07 OF 41 19 2 2 4 08 08 19 45 3 5 1 58 31 se 3 3 2 31 61 36 3 4 3 32 36 63 4 14 1 26 06 06 OF 09 4 8 2 06 29 07 08 10 4 4 3 06 07 29 D8 10 4 2 4 OF 08 08 33 12 4 13 5 09 10 10 12 43 5 3 1 74 25 27 5 6 2 25 41 13 5 4 3 27 13 43 q Data view Vaabievew7 RL gt I The file poll_jointprob sav contains first stage joint probabilities for selected townships within counties County is a first stage stratification variable and Township is a cluster variable Combinations of these variables identify all first stage PSUs uniquely Unit_No_ labels PSUs within each stratum and is used to match up with Joint_Prob_1_ Joint_Prob_2_ Joint_Prob_3_ Joint_Prob_4_ and Joint_Prob_5_ The first two strata each have 4 PSUs therefore the joint 139 Complex Samples Sampling Wizard inclusion probability matrices are 4x4 for these strata and the Joint_Prob_3_ column is left empty for these rows Similarly strata 3 and 5 have 3x3 joint inclusion probability matrices and stratum 4 has a 5x5 joint inclusion probability matrix The need for a joint probabilities file is seen by perusing the values of the joint inclusion probability matrices When the sampling method is not a PPS WOR method the selection ofa PSU
2. a Set to zero because this parameter is redundant b Model Amount spent Intercept shopfor usecoup shopfor usecoup The parameter estimates show the effect of each predictor on Amount spent The value of 518 249 for the intercept term indicates that the grocery chain can expect a shopper with a family who uses coupons from the newspaper and targeted mailings to spend 518 25 on average You can tell that the intercept is associated with these factor levels because those are the factor levels whose parameters are redundant The shopfor coefficients suggest that among customers who use both mailed coupons and newspaper coupons those without family tend to spend less than those with spouses who in turn spend less than those with dependents at home Since the tests of model effects showed that this term contributes to the model these differences are not due to chance The usecoup coefficients suggest that spending among customers with dependents at home decreases with decreased coupon usage There is a moderate amount of uncertainty in the estimates but the confidence intervals do not include 0 The interaction coefficients suggest that customers who do not use coupons or only clip from the newspaper and do not have dependents tend to spend more than you would otherwise expect If any portion of an interaction parameter is redundant the interaction parameter is redundant The deviation in the values of the design effect
3. gt Categorical Design Variables User missing values are invalid User missing values are valid Cases with invalid data for any categorical design variables are excluded from the analysis 55 cn Lu Tables This group determines which cases are used in the analysis m Use all available data Missing values are determined on a table by table basis Thus the cases used to compute statistics may vary across frequency or crosstabulation tables m Use consistent case base Missing values are determined across all variables Thus the cases used to compute statistics are consistent across tables Categorical Design Variables This group determines whether user missing values are valid or invalid 41 Complex Samples Crosstabs Complex Samples Options Figure 7 4 Options dialog box E Complex Samples Crosstabs Options E Subpopulation Display o Allin the same table Each in a separate table SEEN Subpopulation Display You can choose to have subpopulations displayed in the same table or in separate tables Chapter Complex Samples Ratios The Complex Samples Ratios procedure displays univariate summary statistics for ratios of variables Optionally you can request statistics by subgroups defined by one or more categorical variables Example Using the Complex Samples Ratios procedure you can obtain descriptive statistics for the ratio of current property valu
4. 11 28 11 28 11 28 11 28 11 28 11 28 11 28 11 26 11 26 11 26 11 28 11 26 11 26 11 26 17 w w 0 w 0 w w w w w w e You can see the sampling results in the newly created dataset Five new variables were saved to the working file representing the inclusion probabilities and cumulative sampling weights for each stage plus the final sampling weights Voters who were not selected to the sample are excluded from this dataset The final sampling weights are identical for voters within the same neighborhood because they are selected according to a simple random sampling method within neighborhoods However they are different across neighborhoods within the same township because the sampled proportions are not exactly 20 in all neighborhoods 138 Chapter 13 Figure 13 47 Data Editor with sample results voteid nbrhood town county InclusionPr SampleVVeil InclusionPr SampleWWei SampleVVei A obability_1_ ghtCumulat obability_2 ghtCumulatf ght_Final_ i ve_1 ve 2 1 Unlike voters in the second stage the first stage sampling weights are not identical for townships within the same county because they are selected with probability proportional to size Figure 13 48 Joint probabilities file
5. Link Function boty Subpopulation Variable ho Category Select The legislature should enact a gas tax as the dependent variable gt Select Age category through Driving frequency as factors gt Click Statistics 198 Chapter 21 Figure 21 3 Ordinal Regression Statistics dialog box Complex Samples Ordinal Regression Statistics pModel Fit IM Pseudo R square Y Classification table Parameters El Estimate Covariances of parameter estimates Fi Exponentiated estimate Correlations of parameter estimates Standard error Y Design effect Confidence interval Square root of design effect t test Parallel Lines IM Wald test of equal slopes M Parameter estimates for generalized unequal slopes model _ Covariances of parameter estimates for generalized unequal slopes model Fi Summary statistics for model variables Fi Sample design information EE co gt Select Classification table in the Model Fit group gt Select Estimate Exponentiated estimate Standard error Confidence interval and Design effect in the Parameters group Select Wald test of equal slopes and Parameter estimates for generalized unequal slopes model Click Continue Click Hypothesis Tests in the Complex Samples Ordinal Regression dialog box 199 v v vv Complex Samples Ordinal Regression Figure 21
6. Parallel Lines Fi Wald test of equal slopes E Parameter estimates for generalized unequal slopes model E Covariances of parameter estimates for generalized unequal slopes model 2 Summary statistics for model variables 2 Sample design information Ge ona Cue Model Fit Controls the display of statistics that measure the overall model performance Pseudo R square The R statistic from linear regression does not have an exact counterpart among ordinal regression models There are instead multiple measures that attempt to mimic the properties of the R statistic Classification table Displays the tabulated cross classifications of the observed category by the model predicted category on the dependent variable Parameters This group allows you to control the display of statistics related to the model parameters m Estimate Displays estimates of the coefficients m Exponentiated estimate Displays the base of the natural logarithm raised to the power of the estimates of the coefficients While the estimate has nice properties for statistical testing the exponentiated estimate or exp B is easier to interpret m Standard error Displays the standard error for each coefficient estimate Confidence interval Displays a confidence interval for each coefficient estimate The confidence level for the interval is set in the Options dialog box m T test Displays a test of each coefficient estimat
7. Welcome Stage 1 Do you want to draw a sample Design Variables Method Yes Stages lana y Sample Size Ono Output Variables i Summary What type of seed value do you want to use Stage 2 jon Design Variables Oa randomly chosen number Method Custom value 592004 Enter a custom seed value if you want to reproduce g the sample later Sample Size Output Variables Summary Include in the sample frame cases with user missing values of stratification or Add Stage 3 clustering variables Draw Sample T Working data are sorted by stratification variables presorted data may speed i Selection Options processing Output Files Completion gt Select Custom value for the type of random seed to use and type 592004 as the value Using a custom value allows you to replicate the results of this example exactly gt Click Next 133 Figure 13 41 Complex Samples Sampling Wizard Sampling Wizard Draw Sample Selection Options step Draw Sample Output Files Welcome J Stage 1 in Design Variables Method Sample Size Output Variables h Summary J Stage 2 h Design Variables Method Sample Size Output Variables Summary Add Stage 3 a _ Draw Sample i Selection Options gt Output Files Completion E Sampling Wizard E In this panel you can choose where to save sample output data You must save sampled cases to an external file if sampling is done with replacement The selected c
8. heart attack Each case corresponds to a separate patient and records many variables related to their hospital stay poll_cs sav This is a hypothetical data file that concerns pollsters efforts to determine the level of public support for a bill before the legislature The cases correspond to registered voters Each case records the county township and neighborhood in which the voter lives poll_cs_sample sav This hypothetical data file contains a sample of the voters listed in poll_cs sav The sample was taken according to the design specified in the poll csplan plan file and this data file records the inclusion probabilities and sample weights Note however that because the sampling plan makes use of a probability proportional to size PPS method there is also a file containing the joint selection probabilities voll_jointprob sav The additional variables corresponding to voter demographics and their opinion on the proposed bill were collected and added the data file after the sample as taken property_assess sav This is a hypothetical data file that concerns a county assessor s efforts to keep property value assessments up to date on limited resources The cases correspond to properties sold in the county in the past year Each case in the data file records the township in which the property lies the assessor who last visited the property the time since that assessment the valuation made at that time and the sale value of
9. rExport Model as XML File O O Parameter estimates and covariance matrix Parameter estimates only ESE Save Variables This group allows you to save the model predicted category and predicted probabilities as new variables in the active dataset Export model as SPSS Statistics data Writes a dataset in IBMO SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in the matrix file is as follows m rowtype_ Takes values and value labels COV Covariances CORR Correlations EST Parameter estimates SE Standard errors SIG Significance levels and DF Sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other row types m varname_ Takes values P1 P2 corresponding to an ordered list of all model parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the parameter estimates table The cells are blank for other row types m P1 P2 These variables correspond to an ordered list of all model parameters with variable labels corresponding to the parameter strings shown in the parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correla
10. 13 Complex Samples Sampling Wizard The Sampling Wizard guides you through the steps for creating modifying or executing a sampling plan file Before using the wizard you should have a well defined target population a list of sampling units and an appropriate sample design in mind Obtaining a Sample from a Full Sampling Frame A state agency is charged with ensuring fair property taxes from county to county Taxes are based on the appraised value of the property so the agency wants to survey a sample of properties by county to be sure that each county s records are equally up to date However resources for obtaining current appraisals are limited so it s important that what is available is used wisely The agency decides to employ complex sampling methodology to select a sample of properties A listing of properties is collected in property_assess_cs sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use the Complex Samples Sampling Wizard to select a sample Using the Wizard gt To run the Complex Samples Sampling Wizard from the menus choose Analyze gt Complex Samples gt Select a Sample O Copyright SPSS Inc 1989 2010 93 94 Chapter 13 Figure 13 1 Sampling Wizard Welcome step Welcome to the Sampling Wizard The Sampling Wizard helps you design and select a complex sample Your selections will be saved to a plan file that you can use at analysis time to
11. 578 0 8 2 1 5 189 60 579 0 8 2 1 4 200 10 580 0 8 2 1 5 211 50 581 0 e 2 1 4 181 50 641 0 9 2 1 7 192 40 642 0 9 2 1 6 236 70 44 2 25 21 10 93 10 93 643 0 9 2 41 6 15040 Ad 2 25 21 1093 10 93 281 644 0 9 2 1 8 204 80 645 0 9 2 1 6 22540 646 0 g 2 1 7 180 80 44 2 25 21 10 93 10 93 284 647 0 9 2 11 5 176 90 ow Data View A Variable View ui E You can see the sampling results in the Data Editor Five new variables were saved to the working file representing the inclusion probabilities and cumulative sampling weights for each stage plus the final sampling weights m Cases with values for these variables were selected to the sample m Cases with system missing values for the variables were not selected 105 Complex Samples Sampling Wizard The agency will now use its resources to collect current valuations for the properties selected in the sample Once those valuations are available you can process the sample with Complex Samples analysis procedures using the sampling plan property_assess csplan to provide the sampling specifications Obtaining a Sample from a Partial Sampling Frame A company is interested in compiling and selling a database of high quality survey information The survey sample should be representative but efficiently carried out so complex sampling methods are used The full sampling design calls for the following structure Stage Strata C
12. 8 Daily activities lifting or car E Weight Final Annual WTFA_SA E Age category age_cat ek E rn coa Cee The data are obtained using a complex multistage sample However for end users the original NHIS design variables were transformed to a simplified set of design and weight variables whose results approximate those of the original design structures Select Stratum for variance estimation as a strata variable Select PSU for variance estimation as a cluster variable Select Weight Final Annual as the sample weight variable Click Finish 143 Complex Samples Analysis Preparation Wizard Summary Figure 14 3 Summary Stage 1 Design Variables Stratification 1 Stratum for variance estimation Cluster 1 PSU for variance estimation Analysis Information Estimator Assumption Sampling with replacement Plan File c mhis2000_subset csaplan Weight Variable Weight Final Annual SRS Estimator Sampling without replacement The summary table reviews your analysis plan The plan consists of one stage with a design of one stratification variable and one cluster variable With replacement WR estimation is used and the plan is saved to c nhis2000_subset csaplan You can now use this plan file to process nhis2000_subset sav with Complex Samples analysis procedures Preparing for Analysis When Sampling Weights Are Not in the Data File A loan officer has a collection of customer records taken a
13. Optionally in further steps you can Choose output variables to save m Add a second or third stage to the design m Set various selection options including which stages to draw samples from the random number seed and whether to treat user missing values as valid values of design variables Choose where to save output data Paste your selections as command syntax 6 Chapter 2 Sampling Wizard Design Variables Figure 2 2 Sampling Wizard Design Variables step E Sampling Wizard Stage 1 Design Yariables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage 4 Welcome Variables Stratify By E stage 1 amp Property ID propid 2 County county i Neighborhood nbrhood y 3 gt Design Variables Method Sample Size Years since last appraisal time E Value at last appraisal lastval Clusters amp Township town Ly Input Sample Weight incomplete section This step allows you to select stratification and clustering variables and to define input sample weights You can also specify a label for the stage Stratify By The cross classification of stratification variables defines distinct subpopulations or strata Separate samples are obtained for each stratum To improve the precision of
14. 21 Complex Samples Ordinal Regression 195 Using Complex Samples Ordinal Regression to Analyze Survey Results 195 Running the AnalySiS o ooooooooco ee ee eee e tent n ee nee 195 Pseudo R Squares 2 0 nuana anaana 200 Mests of Model EFFECTS 2 cit a a acre A ala 200 Parameter EStiMateS o ooooooooooo eee eee eee 201 Classification ta eaea E 202 Odds RATIOS aca iaa 203 Generalized Cumulative Model 0 0 0 c cece ete eee n ene nees 204 Dropping Non Significant Predictors 2 000 c eee e eee 205 Warming Sci dane ace Cade died ew ee he Kone wee aed aed ace ned eee 207 Comparing Models 222604 ccc es ved rra das da bia d ated tees 208 SUMMATY o igisa A a eats al onan edema 209 Related Proceduit s 2222242444 nidad ei dedds 209 22 Complex Samples Cox Regression 210 Using a Time Dependent Predictor in Complex Samples Cox Regressi0n 210 Preparing the Data ca secsi cnet aaae en E a a aae a E a a ene a een ens 210 Running the Analysis n ananunua 216 Sample Design Information n nannaa naana 221 Tests 0f Model Efecto a dao E ada Renton chs calas 222 Test of Proportional Hazards o oo ooooocoooco eee e ene e eens 222 Adding a Time Dependent Predictor nnana 0 ccc ce ee eee een eee 222 Multiple Cases per Subject in Complex Samples Cox Regression 0 0000 ee eeee 226 Preparing the Data for Analysis 0 0 0 0 ccc ccc ete eee
15. Complex Samples Crosstabs The Complex Samples Crosstabs procedure produces crosstabulation tables for pairs of selected variables and displays two way statistics Optionally you can request statistics by subgroups defined by one or more categorical variables Using Complex Samples Crosstabs to Measure the Relative Risk of an Event A company that sells magazine subscriptions traditionally sends monthly mailings to a purchased database of names The response rate is typically low so you need to find a way to better target prospective customers One suggestion is to focus mailings on people with newspaper subscriptions on the assumption that people who read newspapers are more likely to subscribe to magazines Use the Complex Samples Crosstabs procedure to test this theory by constructing a two by two table of Newspaper subscription by Response and computing the relative risk that a person with a newspaper subscription will respond to the mailing This information is collected in demo_cs sav and should be analyzed using the sampling plan file demo csplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Running the Analysis gt To run a Complex Samples Crosstabs analysis from the menus choose Analyze gt Complex Samples gt Crosstabs Copyright SPSS Inc 1989 2010 165 166 Chapter 17 Figure 17 1 Complex Samples Plan dialog box If you do not have a plan file for your co
16. Each combination of categories defines a subpopulation gt Select Freq vigorous activity times per wk through Freq strength activity times per wk as measure variables Select Age category as a subpopulation variable gt Click Statistics Figure 16 3 Descriptives Statistics dialog box Summaries Mean test Test value Statistics Standard error Unweighted count Confidence interval Population size Level 96 7 Design effect Coefficient of variation Square root of design effect Continuei cancel ne gt Select Confidence interval in the Statistics group 163 Complex Samples Descriptives gt Click Continue Click OK in the Complex Samples Descriptives dialog box Univariate Statistics Figure 16 4 Univariate statistics 95 Confidence Standard Interval Error Freq vigorous activity times per wk Freq moderate activity times per wk Freq strength activity times per wk Each selected statistic is computed for each measure variable The first column contains estimates of the average number of times per week that a person engages in a particular type of activity The confidence intervals for the means are non overlapping Thus you can conclude that overall Americans engage in a strength activity less often than vigorous activity and they engage in vigorous activity less often than moderate
17. IBM SPSS Complex Samples 19 0 Note Before using this information and the product it supports read the general information under Notices on p 267 This document contains proprietary information of SPSS Inc an IBM Company It is provided under a license agreement and is protected by copyright law The information contained in this publication does not include any product warranties and any statements provided in this manual should not be interpreted as such When you send information to IBM or SPSS you grant IBM and SPSS a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you Copyright SPSS Inc 1989 2010 Preface IBM SPSS Statistics is a comprehensive system for analyzing data The Complex Samples optional add on module provides the additional analytic techniques described in this manual The Complex Samples add on module must be used with the SPSS Statistics Core system and is completely integrated into that system About SPSS Inc an IBM Company SPSS Inc an IBM Company is a leading global provider of predictive analytic software and solutions The company s complete portfolio of products data collection statistics modeling and deployment captures people s attitudes and opinions predicts outcomes of future customer interactions and then acts on these insights by embedding analytics into business processes SPSS
18. The model correctly classifies 9 9 more or 37 2 of the cases In particular the model does considerably better at classifying those who Agree or Strongly disagree and slightly worse with those who Disagree Odds Ratios Cumulative odds are defined as the ratio of the probability that the dependent variable takes a value less than or equal to a given response category to the probability that it takes a value greater than that response category The cumulative odds ratio is the ratio of cumulative odds for different predictor values and is closely related to the exponentiated parameter estimates Interestingly the cumulative odds ratio itself does not depend upon the response category Figure 21 11 Cumulative odds ratios for Age category 95 Confidence Cumulative Interval Design Square Root Odds Ratio Effect Design Effect Age 18 30 vs gt 60 1 383 1 793 1 339 category 31 45 vs 60 1 148 1 158 1 076 46 60 vs gt 60 1 100 k 2 206 1 485 Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit a Factors and covariates used in the computation are fixed at the following values Age category gt 650 Gender Female Voted in last election Yes Driving frequency gt 30 000 miles year This table displays cumulative odds ratios for the factor levels of Age category The reported values are the ratios of the cumulative odds for 78 30 throu
19. o oooooo 171 Running the AnalySiS o ooooooooco ene nee nent e ene nees 171 O A aa aetuaeetten 174 Pivoted Ratios Table sisareen deat Bede da Raed aware Dea eee 174 A O 175 Related Procedures o o ooooo een eee eee nee E e eens 175 19 Complex Samples General Linear Model 176 Using Complex Samples General Linear Model to Fit a Two Factor ANOVA 176 Running the AnalySiS o ooooooocoo en en een n eee eee e nee 176 Model SUMMA ces seca daa oe ba dae haaat a aa Eades da h da ai tales 181 Tests of Model Effects 1 0 0 eee tenet nee e eens 181 ParameterEstimates enes ma wader staat a wed be wal dad bees beds 182 Estimated Marginal Means 0 0000 cece ett 183 SUMMANY c g tice Pesca eee ea ee EG and e Peed dae oe eae es 185 Related Procedures 0 0 cc ce en en ee ene ene e eee eens 185 20 Complex Samples Logistic Regression 186 Using Complex Samples Logistic Regression to Assess CreditRisk 186 Running the Analysis 0 0 cc ec eee nee ene n ene nees 186 Pseudo R SquareS eenia a a a a Eai adoa a dake aba 190 ClassifiCation 20 0 dec 004 cara ra ds ad a 191 Tests of Model EffectS oo ooooooooooor o 191 Parameter Estimates s 2064 uniri neita daa a ia da 192 Odde RatOS iii rd a eas ate dla Gada aioe 193 SUMMA ssa cysts kaimi sees edad al os be sd dense eee pata dia ala 194 Related Procedures 0 0 cc ee en ee ee etn ah elait teens 194
20. 1s independent of the selection of another PSU and their joint inclusion probability is simply the product of their inclusion probabilities In contrast the joint inclusion probability for Townships 9 and 10 of County 1 is approximately 0 11 see the first case of Joint_Prob_3_ or the third case of Joint_Prob_1_ or less than the product of their individual inclusion probabilities the product of the first case of Joint_Prob_1_ and the third case of Joint_Prob_3_ is 0 31x0 44 0 1364 The pollsters will now conduct interviews for the selected sample Once the results are available you can process the sample with Complex Samples analysis procedures using the sampling plan poll csplan to provide the sampling specifications and poll_jointprob sav to provide the needed joint inclusion probabilities Related Procedures The Complex Samples Sampling Wizard procedure is a useful tool for creating a sampling plan file and drawing a sample m To ready a sample for analysis when you do not have access to the sampling plan file use the Analysis Preparation Wizard Chapter Complex Samples Analysis Preparation Wizard The Analysis Preparation Wizard guides you through the steps for creating or modifying an analysis plan for use with the various Complex Samples analysis procedures It is most useful when you do not have access to the sampling plan file used to draw the sample Using the Complex Samples Analysis Preparation Wizard to Ready NHIS
21. Complex Samples Ordinal Regression from Disagree to Agree more than half of whom were observed to respond Disagree or Strongly disagree This is a very important distinction that deserves careful consideration before choosing the reduced model Summary Using the Complex Samples Ordinal Regression Procedure you have constructed competing models for the level of support for the proposed bill based on voter demographics The test of parallel lines shows that a generalized cumulative model is not necessary The tests of model effects suggest that Gender and Voted in last election could be dropped from the model and the reduced model performs well in terms of pseudo R2 and overall classification rate compared to the original model However the reduced model misclassifies more voters across the Agree Disagree split so the legislators prefer to keep the original model for now Related Procedures The Complex Samples Ordinal Regression procedure is a useful tool for modeling an ordinal variable when the cases have been drawn according to a complex sampling scheme m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when you are analyzing the sample obtained according to that plan m The Complex Samples Analysis Preparation Wizard is used to specify anal
22. Nonrandom sampling When selection at random is difficult to obtain units can be sampled systematically at a fixed interval or sequentially O Copyright SPSS Inc 1989 2010 1 2 Chapter 1 Unequal selection probabilities When sampling clusters that contain unequal numbers of units you can use probability proportional to size PPS sampling to make a cluster s selection probability equal to the proportion of units it contains PPS sampling can also use more general weighting schemes to select units Unrestricted sampling Unrestricted sampling selects units with replacement WR Thus an individual unit can be selected for the sample more than once Sampling weights Sampling weights are automatically computed while drawing a complex sample and ideally correspond to the frequency that each sampling unit represents in the target population Therefore the sum of the weights over the sample should estimate the population size Complex Samples analysis procedures require sampling weights in order to properly analyze a complex sample Note that these weights should be used entirely within the Complex Samples option and should not be used with other analytical procedures via the Weight Cases procedure which treats weights as case replications Usage of Complex Samples Procedures Your usage of Complex Samples procedures depends on your particular needs The primary types of users are those who m Plan and carry out surveys ac
23. Obtaining a Complex Samples General Linear Model From the menus choose Analyze gt Complex Samples gt General Linear Model Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 45 46 Chapter 9 Figure 9 1 General Linear Model dialog box Variables Dependent Variable Store ID storeid amp Health food store hithf all Size of store size amp Store organization org E Number of customers amp Customer ID custia Gender gender amp Vegetarian veg amp Shopping style style E Inclusion Selection Pr E Cumulative Sampling YY Cumulative Sampling W Factors Variable Category gt Select a dependent variable Optionally you can m Select variables for factors and covariates as appropriate for your data m Specify a variable to define a subpopulation The analysis is performed only for the selected category of the subpopulation variable 47 Complex Samples General Linear Model Figure 9 2 Model dialog box E Complex Samples General Linear Model Model X rSpecify Model Effects O Main effects 9 Custom Factors amp Covariates shoptor Hl usecoup Nested Term Interaction Nesting rIntercept Y Include in model Y Display statistics Specify Model Effects By default the procedure builds a main effe
24. Plan Summary step stage 1 Stage 1 Plan Summary This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample Welcome Summary ge Stage Label Strata Clusters Size Method ie P Design Variables 4 None county town O 3 per stratum PPS VYOR Method Sample Size Output Variables gt Summary Add Stage 2 _ Draw Sample File c ttempipoll csplan Selection Options Output Files Do you want to add stage 2 Completion 9 Yes add stage 2 now O No do not add another stage now Choose this option if the working Choose this option if stage 2 data are not data file contains data for stage 2 available yet or your design has only one stage gt Select Yes add stage 2 now gt Click Next 129 Complex Samples Sampling Wizard Figure 13 37 Sampling Wizard Design Variables step stage 2 a A Stage 2 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables Stratify By _ Stage 1 Voter ID voteia Neighborhood nbrhood Design Variables Method Sample Size Ly Output Variables Summary a Stage 2 Design Variables Clusters i Meth
25. gt Review the sampling plan in the Plan Summary step and then click Next Subsequent steps are largely the same as for a new design See the Help for individual steps for more information gt Navigate to the Finish step and specify a new name for the edited plan file or choose to overwrite the existing plan file Optionally you can m Specify stages that have already been sampled m Remove stages from the plan Sampling Wizard Plan Summary Figure 2 11 Sampling Wizard Plan Summary step Plan Summary Welcome D Plan Summary p Stage 1 Py ik Design Variables Method Sample Size Output Variables f Summary a i Stage 2 Design Variables Method Sample Size Output Variables Summary Add Stage 3 al Draw Sample i Selection Options Output Files Completion E Sampling Wizard E This panel summarizes the sampling plan Indicate any stages that have already been sampled These stages will be locked in the Wizard to prevent accidental changes They cannot be resampled unless you unlock them You can also delete existing stages from the plan Summary Stage Label Strata Clusters Size 1 None county town 4 per stratum None nbrhood 0 2 per stratum File C property_assess csplan Which stages have already been sampled Stages None El Remove stages from the plan Stages 2 rcs Cree This step allows you to review the sampling plan and indicate stages that have alr
26. specify One Or More Values O Range of values Minimum Maximum Select 1 Yes as the value indicating the event of interest rearrest has occurred Click Continue gt Click the Predictors tab 220 Chapter 22 Figure 22 10 Cox Regression dialog box Predictors tab Tine and Event _ Predictors Subgroups Model States Pots Hypothesis Tests Save Export Options Variables il Factors amp Region region Province province amp District district amp Cty city Arrest ID arrest al Age category agecat ry Marital status marital al Social status social i Level of education ed Employed employ amp Gender gender dl Severity of first crime crime1 Z E iat lt Covariates amp Violent first crime violent1 amp Date of release from first arrest date1 Posted bail bail amp Received rehabilitation rehab dl Severity of second crime crime2 Time Dependent Predictors e new Select Age in years age as a covariate gt Click the Statistics tab 221 Figure 22 11 Cox Regression dialog box Statistics tab omplex Samples Cox Regression Sample design information 2 Event and censoring summary E Risk set at event times Complex Samples Cox Regression Tie end Evert Presctors Subgroups Model _ Statistics Pots Hypothesis Tests Save Export Options Parameters E Estimate F Covariances of parameter e
27. subscription Standard Error 7 7 0 The crosstabulation shows that overall few people responded to the mailing However a greater proportion of newspaper subscribers responded 169 Complex Samples Crosstabs Risk Estimate Figure 17 5 Risk estimate for newspaper subscription by response Estimate Newspaper subscription Odds Ratio 1 812 Response Relative For cohort Response Yes 1 673 Risk For cohort Response No 923 Statistics are computed only for 2 by 2 tables with all cells observed The relative risk is a ratio of event probabilities The relative risk of a response to the mailing is the ratio of the probability that a newspaper subscriber responds to the probability that a nonsubscriber responds Thus the estimate of the relative risk is simply 17 2 10 3 1 673 Likewise the relative risk of nonresponse is the ratio of the probability that a subscriber does not respond to the probability that a nonsubscriber does not respond Your estimate of this relative risk is 0 923 Given these results you can estimate that a newspaper subscriber is 1 673 times as likely to respond to the mailing as a nonsubscriber or 0 923 times as likely as a nonsubscriber not to respond The odds ratio is a ratio of event odds The odds of an event is the ratio of the probability that the event occurs to the probability that the event does not occur Thus the estimate of the odds that a newspaper subscriber responds to th
28. 101 6 537 1 000 13 000 105 35 000 97 203 000 97 203 5 603 1 000 13 000 300 92 000 a Reference Category No The individual tests table displays three simple contrasts comparing the spending of customers who do not use coupons to those who do Since the significance values of the tests are less than 0 05 you can conclude that customers who use coupons tend to spend more than those who don t Figure 19 14 Overall test results for estimated marginal means of shopping style df2 Wald F Sig zm ara 007 The overall test table reports the results of a test of all the contrasts in the individual test table Its significance value of less than 0 05 confirms that there is a difference in spending among the levels of Use coupons Note that the overall tests for Use coupons and Who shopping for are equivalent to the tests of model effects because the hypothesized contrast values are equal to 0 185 Complex Samples General Linear Model Figure 19 15 Estimated marginal means by levels of gender by shopping style 95 Confidence Interval ho shopping for Use coupons Mean Std Error Lower Upper Self No 244 3471 6 00949 231 3644 257 3298 From newspaper 324 9708 5 94134 312 1353 337 8063 From mailings 321 3207 4 11028 312 4410 330 2005 From both 343 4916 6 57845 329 2797 357 7034 Self and spouse No 337 1783 7 12181 321 7925 352 5640 From newspaper 380 0468 7 91038 362 9574 397 1361 From mailings 375 3141 6 22468
29. 184 Chapter 19 Figure 19 11 Overall test results for estimated marginal means of gender df df2 Wald F Sig 2 000 12 000 643 593 000 The overall test table reports the results of a test of all of the contrasts in the individual test table Its significance value of less than 0 05 confirms that there is a difference in spending among the levels of Who shopping for Figure 19 12 Estimated marginal means by levels of shopping style 95 Confidence Interval Use coupons Std Error No 319 6455 6 51429 305 5722 333 7188 From newspaper 386 7469 4 32295 377 4077 396 0861 From mailings 394 5028 5 54218 382 5297 406 4760 From both 416 8486 6 51260 402 7790 430 9182 This table displays the model estimated marginal means and standard errors of Amount spent at the factor levels of Use coupons This table is useful for exploring the differences between the levels of this factor In this example a customer who does not use coupons is expected to spend about 319 65 and those who do use coupons are expected to spend considerably more Figure 19 13 Individual test results for estimated marginal means of shopping style Difference Use coupons Contrast Hypothesized Estimate Std Simple Contrast Estimate Value Hypothesized Error df df2 Wald F Si 000 Level From newspaper vs 67 101 Level No Level From mailings vs Level 74 857 000 74 857 5875 1 000 13 000 162 33 000 No Level From both vs Level No ig 67
30. 19 gt Click Continue 178 Chapter 19 Figure 19 2 General Linear Model dialog box Variables Dependent Variable Store ID storeid sper amp Health food store hithf all Size of store size amp Store organization org E Number of customers amp Customer ID custia Gender gender Vegetarian veg Covariates Shopping style style M E Inclusion Selection Pr E Cumulative Sampling YY E Cumulative Sampling VV rc Variable Category gt Select Amount spent as the dependent variable Select Who shopping for and Use coupons as factors gt Click Model 179 Complex Samples General Linear Model Figure 19 3 Model dialog box Specify Model Effects O Main effects Factors amp Covariates shoptor Ul usecoup Nested Term Interaction Nesting Add to Model Clear Intercept Include in model Display statistics Choose to build a Custom model Select Main effects as the type of term to build and select shopfor and usecoup as model terms Select Interaction as the type of term to build and add the shopfor usecoup interaction as a model term Click Continue Click Statistics in the General Linear Model dialog box 180 Chapter 19 Figure 19 4 General Linear Model Statistics dialog box Model Parameters Fi Estimate Covariances of parameter estimates El Standard error Correlations
31. 5 CSLOGISTIC Command Additional Features 000000005 11 Complex Samples Ordinal Regression Complex Samples Ordinal Regression Response Probabilities Complex Samples Ordinal Regression Model 0 00005 Complex Samples Ordinal Regression Statistics 0 5 Complex Samples Hypothesis Tests 00020 eee eee eeaee Complex Samples Ordinal Regression Odds Ratios Complex Samples Ordinal Regression Save oooococococococ ooo Complex Samples Ordinal Regression Options oocooocoocoocooo CSORDINAL Command Additional FeaturesS oooocococococooo o 12 Complex Samples Cox Regression A A E GE a a a a T da Whee Setceenth teen iat tes vii PRE CICUORS rs ape da lts das dou fot deidad alot 78 Define Time Dependent Predictor 0 0 ccc cece cece eee nee eens 79 SUDGMOUPS se siari a sgh eee eke aia Cae a Hada oe Oo oO aa hee bbe gee eae te ad 80 Modeler eos ag Sen het tte can yg cat tare tsetse Ea EAE ERE det doce ETE Aga E ERE 81 A kA os Gk aed Abeer on aa ar EA ae A Er Gobi ee Shea Ste anew aa a a 82 Plots 26 24 328 20d pled a etd ead go doen doe das ad 84 Hypothesis Tests 0 0 0 ccc tte eet ttt eee es 85 DAVOS nears Ened east merge Beat 86 Ae t EAEN ria oa Oi rod 88 Options egaa ia ai a A A A A ii 90 CSCOXREG Command Additional Features 0 0 00 c cee eee eee 91 Part Il Examples 13 Complex Sa
32. Complex Samples 49 59 69 in Complex Samples Cox Regression 85 sequential sampling in Sampling Wizard 8 sequential Sidak correction in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 Sidak correction in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 simple contrasts in Complex Samples General Linear Model 50 simple random sampling in Sampling Wizard 8 square root of design effect in Complex Samples Cox Regression 82 in Complex Samples Crosstabs 39 in Complex Samples Descriptives 34 in Complex Samples Frequencies 30 in Complex Samples General Linear Model 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 in Complex Samples Ratios 43 standard error in Complex Samples Crosstabs 39 in Complex Samples Descriptives 34 163 in Complex Samples Frequencies 30 158 in Complex Samples General Linear Model 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 in Complex Samples Ratios 43 step halving in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 stratification in Analysis Preparation Wizard 20 in Sampling Wizard 6 subpopulation in Complex Samples Cox Regression 80 sum in Complex Samples Descriptives 34 summary in Analysis Preparation Wizard 143 154 in Sampling Wizard 103 135 systematic sampling in Sampling Wizard 8 t test in Complex Samples General Linear Mode
33. Complex Samples Logistic Regression Complex Samples Hypothesis Tests Figure 10 5 Hypothesis Tests dialog box E Complex Samples Logistic Regression Hypothesis Tests Test Statistic pSampling Degrees of Freedom E O Based on sample design Adjusted F Fixed O Chi square Adjusted Chi square Adjustment for Multiple Comparisons Least significant difference O Sequential Sidak Sequential Bonferroni O Sidak Bonferroni Ge ct Lit Test Statistic This group allows you to select the type of statistic used for testing hypotheses You can choose between F adjusted F chi square and adjusted chi square Sampling Degrees of Freedom This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics If based on the sampling design the value is the difference between the number of primary sampling units and the number of strata in the first stage of sampling Alternatively you can set a custom degrees of freedom by specifying a positive integer Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear
34. E Coefficient of variation es Expected E Summaries for 2 by 2 Tables E Odds ratio E Risk difference E Relative risk fo Test of independence of rows and columns Esa Co te Cells This group allows you to request estimates of the cell population size and row column and table percentages Statistics This group produces statistics associated with the population size and row column and table percentages m Standard error The standard error of the estimate Confidence interval A confidence interval for the estimate using the specified level Coefficient of variation The ratio of the standard error of the estimate to the estimate a Expected values The expected value of the estimate under the hypothesis of independence of the row and column variable Unweighted count The number of units used to compute the estimate Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects m Square root of design effect This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects m Residuals The expected value is the number of cases that you would expect in the cell if there were no relationship between the two variables A positive residual indi
35. Ham T J J Meulman D C Van Strien and H Van Engeland 1997 Empirically based subgrouping of eating disorders in adolescents A longitudinal perspective British Journal of Psychiatry 170 363 368 Verdegaal R 1985 Meer sets analyse voor kwalitatieve gegevens in Dutch Leiden Department of Data Theory University of Leiden Ware J H D W Dockery A Spiro III F E Speizer and B G Ferris Jr 1984 Passive smoking gas cooking and respiratory health of children living in six cities American Review of Respiratory Diseases 129 366 374 adjusted chi square in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 adjusted F statistic in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 adjusted residuals in Complex Samples Crosstabs 39 aggregated residuals in Complex Samples Cox Regression 86 analysis plan 19 baseline strata in Complex Samples Cox Regression 80 Bonferroni in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 Breslow estimation method in Complex Samples Cox Regression 90 Brewer s sampling method in Sampling Wizard 8 chi square in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 classification tables in Complex Samples Logistic Regression 57 191 in Complex Samples Ordinal Regression 68 202 clusters in Analysis Preparation Wizard 20 in Sampling Wizard 6 coefficient of variation COV in Complex Sample
36. Next in the Output Variables step 97 Figure 13 4 Complex Samples Sampling Wizard Sampling Wizard Plan Summary step stage 1 Stage 1 Plan Summary Welcome J _ Stage 1 i Design Variables Method Sample Size Output variables gt Summary Add Stage 2 Draw Sample Selection Options Output Files Completion gt Select Yes add stage 2 now gt Click Next This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample Summary Stage Label 1 None Clusters town Size Method 4per stratum Simple Random Sampling WOR Strata county File Ctempiproperty_assess csplan Do you want to add stage 2 Yes add stage 2 now O No do not add another stage now Choose this option if stage 2 data are not available yet or your design has only one stage Choose this option if the working data file contains data for stage 2 Sea sea Bests oo cone ste 98 Chapter 13 Figure 13 5 Sampling Wizard Design Variables step stage 2 E Sampling Wizard E Stage 2 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables Stratify By re
37. Optionally you can select a subject identifier 77 Complex Samples Cox Regression Define Event Figure 12 2 Define Event dialog box rYalues Indicating that Event Has Occurred O Individual values s specify One Or More Values O Range of values minima Maximum conte cancel ner Specify the values that indicate a terminal event has occurred m Individual value s Specify one or more values by entering them into the grid or selecting them from a list of values with defined value labels m Range of values Specify a range of values by entering the minimum and maximum values or selecting values from a list with defined value labels 78 Chapter 12 Predictors Figure 12 3 Cox Regression dialog box Predictors tab omplex Samples Cox Regression Tie end Evert Presctors Subgroups Model Statistics Pots Hypothesis Tests Save Export Options Variables i Factors Region region Province province amp District district amp City city Arrest ID arrest al Age category agecat amp Marital status marital all Social status social dl Level of education ed Employed employ Gender gender dl Severity of first crime crime1 7 Violent first crime violent1 LA Covariates Date of release from first arrest date1 Age in years age Posted bail bail tage Received rehabilitation rehab dl Severity of second crime crime2 Time Dependent
38. P Output Files Completion External file This step allows you to choose where to direct sampled cases weight variables joint probabilities and case selection rules Sample data These options let you determine where sample output is written It can be added to the active dataset written to a new dataset or saved to an external IBM SPSS Statistics data file Datasets are available during the current session but are not available in subsequent sessions unless you explicitly save them as data files Dataset names must adhere to variable naming rules If an external file or new dataset is specified the sampling output variables and variables in the active dataset for the selected cases are written Joint probabilities These options let you determine where joint probabilities are written They are saved to an external SPSS Statistics data file Joint probabilities are produced if the PPS WOR PPS Brewer PPS Sampford or PPS Murthy method is selected and WR estimation is not specified Case selection rules If you are constructing your sample one stage at a time you may want to save the case selection rules to a text file They are useful for constructing the subframe for subsequent stages 16 Chapter 2 Sampling Wizard Finish Figure 2 10 Sampling Wizard Finish step E Sampling Wizard E Completing the Sampling Wizard You have provided all of the information needed to create a sample design and draw a sample
39. Parameters group gt Click Continue Click Odds Ratios in the Logistic Regression dialog box 190 Chapter 20 Figure 20 5 Logistic Regression Odds Ratios dialog box El Complex Samples Logistic Regression Odds Ratios E Factors Odds Ratios for Comparing Factor Levels all Level of education ed e Factor Reference Categ Covariates Odds Ratios for Change in Covariate Values E Household income i e Covariate Units of Change hd Debt to income rati 1 One set of odds ratios is produced for each variable in the Odds Ratios grids For each set all other factors in the model are evaluated at their highest levels all other covariates are evaluated at their means i oct Choose to create odds ratios for the factor ed and the covariates employ and debtinc Click Continue Click OK in the Logistic Regression dialog box Pseudo R Squares Figure 20 6 Pseudo R square statistics Cox and Snell 330 Nagelkerke 451 McFadden 304 Dependent Variable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt In the linear regression model the coefficient of determination R2 summarizes the proportion of variance in the dependent variable associated with the predictor independent variables with larger R2 values indicating that more of the variation is explained by the model to a maximum of 1 For regression models
40. Predictors New Edit Delete The Predictors tab allows you to specify the factors and covariates used to build model effects Factors Factors are categorical predictors they can be numeric or string Covariates Covariates are scale predictors they must be numeric Time Dependent Predictors There are certain situations in which the proportional hazards assumption does not hold That is hazard ratios change across time the values of one or more of your predictors are different at different time points In such cases you need to specify time dependent predictors For more information see the topic Define Time Dependent Predictor on p 79 Time dependent predictors can be selected as factors or covariates 79 Complex Samples Cox Regression Define Time Dependent Predictor Figure 12 4 Cox Regression Define Time Dependent Predictor dialog box E Complex Samples Cox Regression Define Time Dependent Predictor E O nene Variables Numeric Expression Time T e e In T age Arrest ID arrest 8 Age in years age dl Age category agec amp Marital status marital al Social status social i Level of education amp Employed employ amp Gender gender al Severity of first cri Violent first crime v amp Date of release fro Posted bail bail amp Received rehabilitati amp Second arrest arre Functions and Special Variables dl Severity of seco
41. Regression dialog box Warnings Figure 21 17 Warnings for reduced model The log likelihood value cannot be increased after the maximum number of steps in the step halving method The CSORDINAL procedure continues despite the above warning s Subsequent results shown are based on the last iteration Validity of the model fit is uncertain The following message applies to the generalized cumulative model The log likelihood value cannot be increased after the maximum number of steps in the step halving method The warnings note that estimation of the reduced model ended before the parameter estimates reached convergence because the log likelihood could not be increased with any change or step in the current values of the parameter estimates 208 Chapter 21 Figure 21 18 Warnings for reduced model Threshold Regression Pseudo 2 opinion opinion opinion drive drive drive drive drive Iteration Log _gastax _gastax _gastax agec agec agec freq freq freq freq freq Number Likelihood 1 2 3 at 1 at 2 at 3 326640 3 303567 5 303336 3 303335 9 303335 9 303335 9 Redundant parameters are not displayed Their values are always zero in all iterations Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat drivetreq Link function Logit a The log likelihood value cannot be increased after the maximum number of steps in th
42. T_ gt 1 amp T_ lt 2 BP2 T_ gt 2 amp T lt 3 BP3 T_ gt 3 amp T_ lt 4 BP4 Notice that exactly one of the terms in parentheses will be equal to 1 for any given case and the rest will all equal 0 In other words this function means that if time is less than one week use BP1 if it is more than one week but less than two weeks use BP2 and so on 80 Chapter 12 Note If your segmented time dependent predictor is constant within segments as in the blood pressure example given above it may be easier for you to specify the piecewise constant time dependent predictor by splitting subjects across multiple cases See the discussion on Subject Identifiers in Complex Samples Cox Regression on p 74 for more information In the Define Time Dependent Predictor dialog box you can use the function building controls to build the expression for the time dependent covariate or you can enter it directly in the Numeric Expression text area Note that string constants must be enclosed in quotation marks or apostrophes and numeric constants must be typed in American format with the dot as the decimal delimiter The resulting variable is given the name you specify and should be included as a factor or covariate on the Predictors tab Subgroups Figure 12 5 Cox Regression dialog box Subgroups tab El Complex Samples Cox Regression E Time andEvert Predictors _ Subgroups Model Statistics Pits Hypothesis Tests Save Export Options Var
43. You can return to the Sampling Wizard later if you need to add or modify stages After all the stages have been sampled you can use the plan file in any Complex Samples analysis procedure to indicate how the sample was drawn Welcome A Stage 1 if Design Variables Method Sample Size Output Variables Paste the syntax generated by the Wizard into a syntax window Summary Stage 2 Design Variables Method Sample Size Output Variables Summary Add Stage 3 Draw Sample Selection Options i Output Files i P Completion What do you want to do Save the design to a plan file and draw the sample To close this wizard click Finish This is the final step You can save the plan file and draw the sample now or paste your selections into a syntax window When making changes to stages in the existing plan file you can save the edited plan to a new file or overwrite the existing file When adding stages without making changes to existing stages the Wizard automatically overwrites the existing plan file If you want to save the plan to a new file select Paste the syntax generated by the Wizard into a syntax window and change the filename in the syntax commands Modifying an Existing Sample Plan gt From the menus choose Analyze gt Complex Samples gt Select a Sample gt Select Edit a sample design and choose a plan file to edit gt Click Next to continue through the Wizard 17 Sampling from a Complex Design
44. activity Univariate Statistics by Subpopulation Figure 16 5 Univariate statistics by subpopulation 95 Confidence Standard Interval Age category Estimate Error Lower 18 24 Mean Freq vigorous activity 3 92 087 375 4 09 times per wk Ered moderele activity 518 137 491 5 45 times per wk Fro strenigh activity 3 45 085 328 382 times per wk 25 44 Mean Freq vigorous activity 355 048 3 46 365 times per wk Freq moderate activity 473 056 462 484 times per wk Freq strength activity 3 28 052 3 38 times per wk 3 79 O63 y 3 91 Freq vigorous activity times per wk Freq moderate activity times per wk Freq strength activity times per wk Freq vigorous activity times per wk 070 092 111 Freq moderate activity times per wk Freq strength activity times per wk 084 155 164 Chapter 16 Each selected statistic is computed for each measure variable by values of Age category The first column contains estimates of the average number of times per week that people of each category engage in a particular type of activity The confidence intervals for the means allow you to make some interesting conclusions Summary In terms of vigorous and moderate activities 25 44 year olds are less active than those 18 24 and 45 64 and 45 64 year olds are less active than those 65 or older In terms of strength activity 25 44 year olds are less active than those 45 64 and 18 24 and 45 64 year
45. and want to draw a sample Browse File demo csplan Browse posa 5 a gt Select Draw a sample browse to where you saved the plan file and select the demo csplan plan file that you created gt Click Next 120 Chapter 13 Figure 13 28 Sampling Wizard Plan Summary step stage 3 Plan Summary This panel summarizes the sampling plan Indicate any stages that have already been drawn and should not be resampled Welcome Summary gt Plan Summary gt Draw Sample None region province 3 0 per stratum Simple Random Sampling i Selection Options VOR Output Files None district city 04 per stratum Simple Random Sampling Completion ONOR None subdivision 02 per stratum Simple Random Sampling ONOR File demo csplan Which stages have already been sampled Stages 12 Select 1 2 as stages already sampled gt Click Next 121 Complex Samples Sampling Wizard Figure 13 29 Sampling Wizard Draw Sample Selection Options step Draw Sample Selection Options In this panel you can choose which stages to extract and set other sampling options such as the seed used for random number generation Welcome Plan Summary Which stages do you want to sample o Draw Sample i P Selection Options Stages 3 y Output Files Completion What type of seed value do you want to use Oa randomly chosen number Custom value 4231946 Enter a custom
46. assumed for estimation f Design Variables i Estimation Method WR sampling with replacement Size If you choose this option you will not be able to add additional stages Any sample stages after the current stage will be ignored when the data are analyzed Equal WOR equal probability sampling without replacement The next panel will ask you to specify inclusion probabilities or population sizes Unequal WOR unequal probability sampling without replacement Joint probabilities will be required to analyze sample data This option is available in stage 1 only Dd incomplete section coe us This step allows you to specify an estimation method for the stage WR sampling with replacement WR estimation does not include a correction for sampling from a finite population FPC when estimating the variance under the complex sampling design You can choose to include or exclude the FPC when estimating the variance under simple random sampling SRS Choosing not to include the FPC for SRS variance estimation is recommended when the analysis weights have been scaled so that they do not add up to the population size The SRS variance estimate is used in computing statistics like the design effect WR estimation can be specified only in the final stage of a design the Wizard will not allow you to add another stage if you select WR estimation Equal WOR equal probability sampling without replacement Equal WOR
47. click Next in the Output Variables step 112 Chapter 13 Figure 13 20 Sampling Wizard Plan Summary step stage 2 Stage 2 Plan Summary This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample Welcome Summary Stage 1 Stage Label Strata Clusters _ Size Method Design Variables 4 None region province 3 per stratum Simple Random Sampling Method WOR Sample Size None district city DA per stratum Simple Random Sampling Output Variables WOR Summary _ Stage 2 i Design Variables File c tempidemo csplan Method Sample Size Output Variables Yes add stage 3 now O No do not add another stage now Do you want to add stage 3 gt Summary Choose this option if the working Choose this option if stage 3 data are not Add Stage 3 data file contains data for stage 3 available yet or your design has only two f Draw Sample stages i Selection Options Output Files Completion gt Select Yes add stage 3 now gt Click Next 113 Figure 13 21 Complex Samples Sampling Wizard Sampling Wizard Design Variables step stage 3 Stage 3 Design Variables Welcome Stage 1 i Design Variables Method Sample Size Output Variables Summary Stage 2 i Design Variables Method Sample Size Output Variables i Summary al Dd Stage 3 i gt Design Variables Method Sample Size Output
48. indicate how the data were sampled You can also use the wizard to modify a sampling plan or draw a sample according to an existing plan What would you like to do 9 Design a sample Choose this option if you have not created a plan File Browse Y file You will have the option to draw the sample UProperty_assess c O Edit a sample design Choose this option if you want to add remove or modify stages of an existing plan You will Browse et ad have the option to draw the sample O Draw the sample Choose this option if you already have a plan file and want to draw a sample PA posas oe Co aes Select Design a sample browse to where you want to save the file and type property_assess csplan as the name of the plan file gt Click Next 95 Complex Samples Sampling Wizard Figure 13 2 Sampling Wizard Design Variables step stage 1 E Sampling Wizard sss Stage 1 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables mesei 2 Property ID propia gt Design Variables Neighborhood nbrhood Method Years since last appraisal time Sample Size Value at last appraisal lastval Output Variables Summary Add Stage 2 Dray i Selection
49. may experience major medical events that alter their medical history In this dataset the occurrence of myocardial infarction ischemic stroke or hemorrhagic stroke is noted and the time of the event recorded You could create computable time dependent covariates within the procedure to include this information in the model but it should be more convenient to use multiple cases per subject Note that the variables were originally coded so that the patient history is recorded across variables so you will need to restructure the dataset Left truncation The onset of risk starts at the time of the ischemic stroke However the sample only includes patients who have survived the rehabilitation program thus the sample is left truncated in the sense that the observed survival times are inflated by the length of rehabilitation You can account for this by specifying the time at which they exited rehabilitation as the time of entry into the study No sampling plan The dataset was not collected via a complex sampling plan and is considered to be a simple random sample You will need to create an analysis plan to use Complex Samples Cox Regression The dataset is collected in stroke_survival sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use the Restructure Data Wizard to prepare the data for analysis then the Analysis Preparation Wizard to create a simple random sampling plan and finally Compl
50. medical history The following table shows how to structure such a dataset Patient ID is the subject identifier End time defines the observed intervals Status records major medical events and Prior history of heart attack and Prior history of hemorrhaging are piecewise constant time dependent predictors Patient ID End time Status Prior history of Prior history of heart attack hemorrhaging 1 5 Heart Attack No No 1 Hemorrhaging Yes No 1 8 Died Yes Yes 2 24 Died No No 3 8 Heart Attack No No 3 15 Died Yes No Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Typically Cox regression models assume proportional hazards that is the ratio of hazards from one case to another should not vary over time If this assumption does not hold you may need to add time dependent predictors to the model Kaplan Meier Analysis If you do not select any predictors or do not enter any selected predictors into the model and choose the product limit method for computing the baseline survival curve on the Options tab the procedure performs a Kaplan Meier type of survival analysis To Obtain Complex Samples Cox Regression From the menus choose Analyze gt Complex Samples gt Cox Regression Select a plan file Optionally select a custom joint probabilities
51. modify a sampling plan or draw a sample according to an existing plan What would you like to do 9 Design a sample Choose this option if you have not created a plan file File Browse x You will have the option to draw the sample g emo cspien O Edit a sample design Choose this option if you want to add remove or modify stages of an existing plan You will have the EN ay option to draw the sample O Draw the sample Choose this option if you already have a plan file and want to draw a sample 5 Ge ed gt Select Design a sample browse to where you want to save the file and type demo csplan as the name of the plan file gt Click Next 107 Complex Samples Sampling Wizard Figure 13 15 Sampling Wizard Design Variables step stage 1 E Sampling Wizard y Stage 1 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables B stage 1 District district i gt Design Variables Cty cty Method Subdivision subdivision Sample Size Output Variables Summary Add Stage 2 Draw i Selection Options Output Files Completion Input Sample Weight incomplete section Select Region as a stratification variable gt Select Province a
52. nee gt Select Table percent in the Cells group gt Select Confidence interval in the Statistics group Click Continue Click OK in the Frequencies dialog box Complex Samples Frequencies 158 Chapter 15 Frequency Table Figure 15 4 Frequency table for variable situation 95 Confidence Interval Estimate Standard Error Population Size Yes 102767095 1185126 709 100435967 105098223 No 90794234 1094401 949 83641560 92946908 Total Yes 193561329 1789098 713 190042196 197080462 of Total No Total Each selected statistic is computed for each selected cell measure The first column contains estimates of the number and percentage of the population that do or do not take vitamin mineral supplements The confidence intervals are non overlapping thus you can conclude that overall more Americans take vitamin mineral supplements than not Frequency by Subpopulation Figure 15 5 Frequency table by subpopulation Age 95 Confidence Interval category Estimate Standard Error 18 24 Population 10018312 350602 352 9328681 9 10707942 Size 15472368 499182 391 14490483 16454253 25490680 680732 812 24151688 26829672 of Total Population 39163840 660855 719 37063946 40463734 Size 39503150 645934187 38232606 40773694 78666990 961114 325 76776491 80557489 of Total Population 34154952 598603 728 32977507 35332397 Size 24005512 497723 833 23026496 24984528 58160464 814680 415 56557999
53. of hemorrhagic stroke hs Los cose mesas canoa area gt Select Varies by subject and select Length of stay for rehabilitation los_rehab as the start variable Note that the restructured variable has taken the variable label from the first variable used to construct it though the label is not necessarily appropriate for the constructed variable gt Select Time to first event post attack time_to event as the end variable Select First event post attack event as the status variable gt Click Define Event 249 Complex Samples Cox Regression Figure 22 43 Define Event dialog box Values Indicating that Event Has Occurred Individual values s specify One Or More Values 0 No event observed 1 Myocardial infarction 2 Ischemic stroke 3 Hemorrhagic stroke Range of values Minimum Maximum gt Select 4 Death as the value indicating the terminal event has occurred gt Click Continue 250 Chapter 22 Figure 22 44 Cox Regression dialog box Time and Event tab _Tme an Event Predictors Subgroups Model Statistics Pots Hypothesis Tests Save Export Options Variables Survival Time amp Hospital ID hospid Start of Interval Onset of Risk al Hospital size hospsize O Time 0 sa Attending physician ID physid easy by subject Age in years age Start Veriable ll Age category agecat A amp Gender gender amp Physically active active Obesity
54. olds are less active than those 65 or older Using the Complex Samples Descriptives procedure you have obtained statistics for the activity levels of U S citizens Overall Americans spend varying amounts of time at different types of activities When broken down by age it roughly appears that post collegiate Americans are initially less active than they were while in school but become more conscientious about exercising as they age Related Procedures The Complex Samples Descriptives procedure is a useful tool for obtaining univariate descriptive statistics of scale measures for observations obtained via a complex sampling design The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when you are analyzing the sample obtained according to that plan The Complex Samples Analysis Preparation Wizard is used to set analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan The Complex Samples Ratios procedure provides descriptive statistics for ratios of scale measures The Complex Samples Frequencies procedure provides univariate descriptive statistics of categorical variables Chapter 17
55. sample design specifications through the current stage From here you can either proceed to the next stage creating it if necessary or set options for drawing the sample 14 Chapter 2 Sampling Wizard Draw Sample Selection Options Figure 2 8 Sampling Wizard Draw Sample Selection Options step E Sampling Wizard E Draw Sample Selection Options In this panel you can choose whether to draw a sample You can pick which stages to extract and set other sampling options such as the seed used for random number generation Welcome E _ Stage 1 Do you want to draw a sample ik Design Variables Method Yes Stages Sample Size Output Variables No Summary Stage 2 What type of seed value do you want to use Design Variables Method Sample Size Custom value Output Variables Summary Add Stage 3 Draw Sample I gt Selection Options Output Files Completion a randomly chosen number Enter a custom seed value if you want to reproduce the sample later Include in the sample frame cases with user missing values of stratification or clustering variables E Working data are sorted by stratification variables presorted data may speed processing 0 E ri Get This step allows you to choose whether to draw a sample You can also control other sampling options such as the random seed and missing value handling Draw sample In addition to choosing whether to draw a sample you can also choose to
56. sesasi rei ted de ede danas pe dad de ab de 146 SUMMANY e odo me avant A dee ended A weld anna eda mares 154 Related Procedures tos dar e o aa tat rd ee deta ees s 154 15 Complex Samples Frequencies 155 Using Complex Samples Frequencies to Analyze Nutritional Supplement Usage 155 Running the AnalySiS o oooooooooo eee ete n tent n ee nees 155 Frequency Table 00 0 cece eee eee 158 Frequency by Subpopulation ooococccccccco tenes 158 A O TN 159 Related Procedures 159 16 Complex Samples Descriptives 160 Using Complex Samples Descriptives to Analyze Activity LevelS ooooo ooooo 160 Running the Analysis 0 00 ccc eee eee eee eee eae 160 Univariate Statistics 0 2 0 teeta 163 Univariate Statistics by Subpopulation 0 000 cece ee 163 SUMMA Vr tee a dea ade alet toa ane 164 Related Procedures 000 c eee eee eee teen eee 164 17 Complex Samples Crosstabs 165 Using Complex Samples Crosstabs to Measure the Relative Risk of an Event 165 Running the Analysis 0 0 cc ee eee eee eee n ene nees 165 CFOS StU ATO Mbeya nio aa a iia calas 168 Risk Estimate o o o ooo ee ee een een tenet een eens 169 Risk Estimate by Subpopulation 0000 eee 170 SUMMA VG ees tenderer E 170 Related Procedures 170 18 Complex Samples Ratios 171 Using Complex Samples Ratios to Aid Property Value Assessment
57. stage You can also provide a label for the stage that will be used in the output Welcome Variables E Stage 1 WI Cooma TGI ST i gt Design Variables Taking anti clotting drug Estimation Method amp History of transient isch Summary Time to hospital time Completion dl Initial Rankin score ranki CAT scan result catscan Clot dissolving drugs cl Clusters dl Treatment result result Post event preventative 3 amp Post event rehabilitation E Total treatment and reha amp Event Index event_index First event post attack Sample Weight S Lean ot ses coros Y UN 8 Time to first event post al History of myocardial inf dl History of ischemic strok dl History of hemorrhagic s Stage Label sid Back next gt Finish cancel et gt Select sampleweight as the sample weight variable gt Click Next 246 Chapter 22 Figure 22 40 Analysis Preparation Wizard Estimation Method reparation Wizar Stage 1 Estimation Method In this panel you select a method for estimating standard errors The estimation method depends on assumptions about how the sample was drawn Welcome E _ Stage 1 Which of the following sample designs should be assumed for estimation it Design Variables gt Estimation Method WR sampling with replacement Summary If you choose this option you will not be able to add additional stages Any sample stages Complet
58. steps of the Wizard In other words variables removed from the source list in a particular step are removed from the list in all steps Variables returned to the source list appear in the list in all steps Tree Controls for Navigating the Sampling Wizard On the left side of each step in the Sampling Wizard is an outline of all the steps You can navigate the Wizard by clicking on the name of an enabled step in the outline Steps are enabled as long as all previous steps are valid that is if each previous step has been given the minimum required specifications for that step See the Help for individual steps for more information on why a given step may be invalid 8 Chapter 2 Sampling Wizard Sampling Method Figure 2 3 Sampling Wizard Sampling Method step E Sampling Wizard Stage 1 Sampling Method In this panel you can choose how to select items from the working data file If you choose a PPS probability proportional to size sampling method you must also specify a measure of size MOS f Welcome E Stage 1 i Design Variables i P Method Without replacement NOR Sample Size With replacement AR E Use VWR estimation for analysis D incomplete section This step allows you to specify how to select cases from the active dataset Method Controls in this group are used to choose a selection method Some sampling types allow you to choose whether to sample with replacement WR or without replace
59. stro Prescribed nitroglycerin nitro amp Taking anti clotting drugs a History of transient ischemi Target Variable iC Time to hospital time al naa EE dl initial Rankin score rankin0 dl a dial infarction mit amp CAT scan result catscan dl tim n mi2 Clot dissolving drugs clotso dl Treatment result result amp Post event preventative sur amp Post event rehabilitation re Fixed Variable s Length of stay for rehabilitat Total treatment and rehabilit amp First event post attack eve ere ke 50 eones sons Wariables to be Transposed gt Type mi as the target variable Select History of myocardial infarction mi History of myocardial infarction mil and History of myocardial infarction mi2 as variables to be transposed Select trans5 from the target variable list 235 Complex Samples Cox Regression Figure 22 29 Restructure Data Wizard Variables to Cases Select Variables step Variables to Cases Select Variables For each variable group you have in the current data the restructured file will have one target variable Inthis step choose how to identify case groups in the restructured data and choose which variables belong with each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File gO LA SCAN resul CASCA I Case Group
60. surgery surgery amp Post event rehabilitation rehab E Total treatment and rehabilitation costs in th amp Event Index event_index Time Dependent Predictors us caes gt Select History of myocardial infarction mi through History of hemorrhagic stroke hs as factors gt Click the Statistics tab 252 Chapter 22 Figure 22 46 Cox Regression dialog box Statistics tab Tine and Evert Presctors Subgroups Model Salisics Pots Hypothesis Tests Save Export Options Sample design information Event and censoring summary Risk set at event times Parameters Estimate E Covariances of parameter estimates IM Exponentiated estimate T Correlations of parameter estimates M Standard error E Design effect M Confidence interval E Square root of design effect E t test Model Assumptions E Test of proportional hazards Time Function Kaplan Meier y E Parameter estimates for alternative model a Covariance matrix for alternative model Baseline survival and cumulative hazard functions gt Select Estimate Exponentiated estimate Standard error and Confidence interval in the Parameters group gt Click the Plots tab 253 v v vv y Figure 22 47 Cox Regression dialog box Statistics tab Complex Samples Cox Regression Tine and Evert Predetors Subgroups Model Statistics Pots Hypothesis Tests Save Export Options a Survival func
61. the property property_assess_cs sav This is a hypothetical data file that concerns a state assessor s efforts to keep property value assessments up to date on limited resources The cases correspond to properties in the state Each case in the data file records the county township and neighborhood in which the property lies the time since the last assessment and the valuation made at that time property_assess_cs_sample sav This hypothetical data file contains a sample of the properties listed in property_assess_cs sav The sample was taken according to the design specified in the property_assess csplan plan file and this data file records the inclusion probabilities 264 Appendix A and sample weights The additional variable Current value was collected and added to the data file after the sample was taken recidivism sav This is a hypothetical data file that concerns a government law enforcement agency s efforts to understand recidivism rates in their area of jurisdiction Each case corresponds to a previous offender and records their demographic information some details of their first crime and then the time until their second arrest if it occurred within two years of the first arrest recidivism_cs_sample sav This is a hypothetical data file that concerns a government law enforcement agency s efforts to understand recidivism rates in their area of jurisdiction Each case corresponds to a previous offender released f
62. time and predictor values for each case Lower bound of confidence interval for survival function Saves the lower bound of the confidence interval for the survival function at the observed time and predictor values for each case 87 Complex Samples Cox Regression Upper bound of confidence interval for survival function Saves the upper bound of the confidence interval for the survival function at the observed time and predictor values for each case Cumulative hazard function Saves the cumulative hazard or In survival at the observed time and predictor values for each case Lower bound of confidence interval for cumulative hazard function Saves the lower bound of the confidence interval for the cumulative hazard function at the observed time and predictor values for each case Upper bound of confidence interval for cumulative hazard function Saves the upper bound of the confidence interval for the cumulative hazard function at the observed time and predictor values for each case Predicted value of linear predictor Saves the linear combination of reference value corrected predictors times regression coefficients The linear predictor is the ratio of the hazard function to the baseline hazard Under the proportional hazards model this value is constant across time Schoenfeld residual For each uncensored case and each nonredundant parameter in the model the Schoenfeld residual is the difference between the
63. with a categorical dependent variable it is not possible to compute a single R statistic that has all of the characteristics of R2 in the linear regression model so these approximations are computed instead The following methods are used to estimate the coefficient of determination m Cox and Snell s R Cox and Snell 1989 is based on the log likelihood for the model compared to the log likelihood for a baseline model However with categorical outcomes it has a theoretical maximum value of less than 1 even for a perfect model m Nagelkerke s R2 Nagelkerke 1991 is an adjusted version of the Cox amp Snell R square that adjusts the scale of the statistic to cover the full range from 0 to 1 m McFadden s R McFadden 1974 is another version based on the log likelihood kernels for the intercept only model and the full estimated model 191 Complex Samples Logistic Regression What constitutes a good R2 value varies between different areas of application While these statistics can be suggestive on their own they are most useful when comparing competing models for the same data The model with the largest R statistic is best according to this measure Classification Figure 20 7 Classification table Predicted cee No ves Percent Correct 188289 667 31871 267 85 5 49970 600 77675133 60 9 ser Percent 68 5 31 5 76 5 Dependent Variable Previously defaulted reference categor
64. 1 Wharton School MBA students and their spouses were asked to rank 15 breakfast items in order of preference with 1 most preferred to 15 least preferred Their preferences were recorded under six different scenarios from Overall preference to Snack with beverage only breakfast overall sav This data file contains the breakfast item preferences for the first scenario Overall preference only broadband_1 sav This is a hypothetical data file containing the number of subscribers by region to a national broadband service The data file contains monthly subscriber numbers for 85 regions over a four year period broadband_2 sav This data file is identical to broadband _1 sav but contains data for three additional months car_insurance_claims sav A dataset presented and analyzed elsewhere McCullagh and Nelder 1989 concerns damage claims for cars The average claim amount can be modeled as having a gamma distribution using an inverse link function to relate the mean of the dependent variable to a linear combination of the policyholder age vehicle type and vehicle age The number of claims filed can be used as a scaling weight car_sales sav This data file contains hypothetical sales estimates list prices and physical specifications for various makes and models of vehicles The list prices and physical specifications were obtained alternately from edmunds com and manufacturer sites car_sales_uprepared sav T
65. 1 in 2 Property ID propid 2 Neighborhood nbrhood a Years since last appraisal time Method E Walue at last appraisal lastval Sample Size Output Variables i Summary a Dd Stage 2 poi Design Variables Method Sample Size Output Variables Add Dray mple Selection Options Output Files Completion ese Ge 00 incomplete section Select Neighborhood as a stratification variable Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each neighborhood of the townships drawn in stage 1 In this stage properties are drawn as the primary sampling unit using simple random sampling 99 Complex Samples Sampling Wizard Figure 13 6 Sampling Wizard Sample Size step stage 2 Stage 2 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw Welcome Variables Design Variables Method Value at last a O Value Sample Size clusio o The size value applies Output Variables e linc to each stratum i Summary ae ie J Stage 2 rection Prona Unequal values for strata i i Define Design Variables 8 Final Sampling Weight Sa Method gt Sample Size O Read values from variab
66. 140 in Complex Samples Descriptives 160 R statistic in Complex Samples General Linear Model 48 181 ratios in Complex Samples Ratios 174 reference category in Complex Samples General Linear Model 50 in Complex Samples Logistic Regression 55 relative risk in Complex Samples Crosstabs 39 165 169 170 repeated contrasts in Complex Samples General Linear Model 50 residuals in Complex Samples Crosstabs 39 in Complex Samples General Linear Model 51 response probabilities in Complex Samples Ordinal Regression 66 risk difference in Complex Samples Crosstabs 39 row percentages in Complex Samples Crosstabs 39 Sampford s sampling method in Sampling Wizard 8 sample design information in Complex Samples Cox Regression 82 221 254 sample files location 258 sample plan 4 sample proportion in Sampling Wizard 12 sample size in Sampling Wizard 10 12 274 Index sample weights in Analysis Preparation Wizard 20 in Sampling Wizard 12 sampling complex design 4 sampling estimation in Analysis Preparation Wizard 22 sampling frame full in Sampling Wizard 93 sampling frame partial in Sampling Wizard 105 sampling method in Sampling Wizard 8 Schoenfeld s partial residuals in Complex Samples Cox Regression 86 score residuals in Complex Samples Cox Regression 86 separation in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 sequential Bonferroni correction in
67. 3 J _ Draw Sample i Selection Options i Output Files i p Completion To close this wizard click Finish gt Click Finish These selections produce the sampling plan file poll csplan and draw a sample according to that plan save the sample results to the new dataset poll_cs_sample and save the joint probabilities file to the external data file poll_jointprob sav 135 Plan Summary Figure 13 43 Plan summary Complex Samples Sampling Wizard PO stage Stage Design Stratification Variables Cluster Sample Selection Method Information Measure of Size Proportion of Units Sampled Minimum Number of Units Sampled Maximum Number of Units Sampled Variables Created or Sample Weight Analysis Estimator Assumption Information Inclusion Probability Plan File c poll csplan Weight Variable SampleVVeight_Final_ Stagewise Inclusion Modified Selection Probability Stagewise Cumulative County Township PPS sampling without replacement Obtained from data Inclusion Probability _1_ SampleWWeight Cumulative_1_ Unequal probability sampling without replacement using joint inclusion probabilities Obtained from variable Inclusion Probability _1_ Neighborhood Simple random sampling without replacement Inclusion Probability_2_ SampleVVeight Cumulative_2_ Equal probability sampling without replacement Obtained from variable Inclusion Probability_2_ The summary
68. 361 8665 388 7617 From both 388 8054 7 12101 373 4214 404 1894 Self and family No 377 4111 11 58215 352 3894 402 4328 From newspaper 455 2232 6 14420 441 9494 468 4969 From mailings 486 9736 10 76529 463 6166 510 1306 From both 518 2488 11 73120 492 9050 543 5925 This table displays the model estimated marginal means standard errors and confidence intervals of Amount spent at the factor combinations of Who shopping for and Use coupons This table is useful for exploring the interaction effect between these two factors that was found in the tests of model effects Summary In this example the estimated marginal means revealed differences in spending between customers at varying levels of Who shopping for and Use coupons The tests of model effects confirmed this as well as the fact that there appears to be a Who shopping for Use coupons interaction effect The model summary table revealed that the present model explains somewhat more than half of the variation in the data and could likely be improved by adding more predictors Related Procedures The Complex Samples General Linear Model procedure is a useful tool for modeling a scale variable when the cases have been drawn according to a complex sampling scheme m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specifie
69. 4 4 4 4 3 4 4 4 4 4 1 1 33 1311 0 1 0 2 4 1311 1325 1 1 0 3 3 1325 3 3 3 3 1 4 12 1098 1 1 0 2 3 1098 3 3 3 3 3 3 3 3 3 3 1 4 4 1356 0 1 0 2 3 1356 3 3 3 3 3 3 3 3 3 3 The restructured data contains three cases for every patient however many patients experienced fewer than three events so there are many cases with negative missing values for event You can simply filter these from the dataset gt To filter these cases from the menus choose Data gt Select Cases 240 Chapter 22 Figure 22 34 Select Cases dialog box amp Cholesterol choles amp History of angina a Prescribed nitroglyc amp Taking anti clotting amp History of transient Time to hospital time al Initial Rankin score amp CAT scan result ca Clot dissolving drug al Treatment result re amp Post event prevent amp Post event rehabilit Total treatment and amp Event Index event_ amp First event post att 8 Length of stay for r gt Time to first event p al History of myocardi dl History of ischemic al History of hemorrha Select O All cases If condition is satisfied Random sample of cases Based on time or case range O Use fitter variable O Filter out unselected cases Copy selected cases to a new dataset Dataset name O Delete unselected cases Current Status Do not filter cases os s Ge
70. 4 Hypothesis Tests dialog box omplex Samples Ordinal Regression Hypothesis Tests p Test Statistic Sampling Degrees of Freedom OF 9 Based on sample design O Adjusted F OFixed value O Chi square 7 O Adjusted Chi square rAdjustment for Multiple Comparisons O Least significant difference Sequential Sidak O Sequential Bonferroni O Sidak O Bonferroni sa cet oe Even for a moderate number of predictors and response categories the Wald F test statistic can be inestimable for the test of parallel lines Select Adjusted F in the Test Statistic group Select Sequential Sidak as the adjustment method for multiple comparisons Click Continue Click Odds Ratios in the Complex Samples Ordinal Regression dialog box Figure 21 5 Ordinal Regression Odds Ratios dialog box omplex Samples Ordinal Regression Factors Cumulative Odds Ratios for Comparing Factor Levels Caer gener Factor Reference Category Voted in last electio e Age category agecat Highest value dl Driving frequency Covariates Cumulative Odds Ratios for Change in Covariate Values C Covariate UnitsofChange AA One set of cumulative odds ratios is produced tor each variable inthe Odds Ratios grids For each set all other factors in the model are evaluated at their highest levels all other covariates are evaluated at their means a ce Choose to produce cumulative odds ratios for Age cate
71. 59762929 of Total Population 19429991 439459793 18565580 20294402 Size 11813204 314238 078 11195102 12431306 31243195 587623 439 30087348 32399042 of Total When computing statistics by subpopulation each selected statistic is computed for each selected cell measure by value of Age category The first column contains estimates of the number and percentage of the population of each category that do or do not take vitamin mineral supplements The confidence intervals for the table percentages are all non overlapping thus you can conclude that the use of vitamin mineral supplements increases with age 159 Complex Samples Frequencies Summary Using the Complex Samples Frequencies procedure you have obtained statistics for the use of nutritional supplements among U S citizens m Overall more Americans take vitamin mineral supplements than not m When broken down by age category greater proportions of Americans take vitamin mineral supplements with increasing age Related Procedures The Complex Samples Frequencies procedure is a useful tool for obtaining univariate descriptive statistics of categorical variables for observations obtained via a complex sampling design m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when
72. COMPANY PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF NON INFRINGEMENT MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE Some states do not allow disclaimer of express or implied warranties in certain transactions therefore this statement may not apply to you This information could include technical inaccuracies or typographical errors Changes are periodically made to the information herein these changes will be incorporated in new editions of the publication SPSS Inc may make improvements and or changes in the product s and or the program s described in this publication at any time without notice Any references in this information to non SPSS and non IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites The materials at those Web sites are not part of the materials for this SPSS Inc product and use of those Web sites is at your own risk When you send information to IBM or SPSS you grant IBM and SPSS a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you Information concerning non SPSS products was obtained from the suppliers of those products their published announcements or other publicly available sources SPSS has not tested those products and cannot confirm the accuracy of perform
73. Complex Samples Ratios 43 confidence level in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 contrasts in Complex Samples General Linear Model 50 correlations of parameter estimates in Complex Samples General Linear Model 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 covariances of parameter estimates in Complex Samples General Linear Model 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 Cox Snell residuals in Complex Samples Cox Regression 86 crosstabulation table in Complex Samples Crosstabs 168 cumulative probabilities in Complex Samples Ordinal Regression 71 cumulative values in Complex Samples Frequencies 30 degrees of freedom in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 design effect in Complex Samples Cox Regression 82 in Complex Samples Crosstabs 39 in Complex Samples Descriptives 34 in Complex Samples Frequencies 30 in Complex Samples General Linear Model 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 in Complex Samples Ratios 43 deviance residuals in Complex Samples Cox Regression 86 deviation contrasts in Complex Samples General Linear Model 50 difference contrasts in Complex Samples General Linear Model 50 Efron estimation method in Complex Samples Cox Regression 90 estimated marginal means in Complex
74. Design Variables User missing values are invalid O User missing values are valid Cases with invalid data for any categorical design variables are excluded from the analysis eS Tables This group determines which cases are used in the analysis Use all available data Missing values are determined on a table by table basis Thus the cases used to compute statistics may vary across frequency or crosstabulation tables Use consistent case base Missing values are determined across all variables Thus the cases used to compute statistics are consistent across tables Categorical Design Variables This group determines whether user missing values are valid or invalid 32 Chapter 5 Complex Samples Options Figure 5 4 Options dialog box E Complex Samples Crosstabs Options E Subpopulation Display o Allin the same table Each in a separate table ina e e Subpopulation Display You can choose to have subpopulations displayed in the same table or in separate tables Chapter Complex Samples Descriptives The Complex Samples Descriptives procedure displays univariate summary statistics for several variables Optionally you can request statistics by subgroups defined by one or more categorical variables Example Using the Complex Samples Descriptives procedure you can obtain univariate descriptive statistics for the activity levels of U S citizens based on the re
75. Identification amp Clot dissolving drugs clotso 5 al Treatment result result Use selected variable v amp Post event preventative sur ry ance amp Post event rehabilitation re Length of stay for rehabilitat Total treatment and rehabilit Variables to be Transposed amp First event post attack eve Target Variable fs y Time to first event post atta z all History of myocardial infarct all History of ischemic stroke is ell History of ischemic stroke i ofl History of ischemic stroke is1 all History of hemorrhagic stro al History of ischemic stroke is2 amp Second event post attack e E Time to second event post dl History of myocardial infarct dll History of ischemic stroke i all History of hemorrhagic stro amp Third event post eve v bid ee er Fixed Variable s ere es 50 canes sons Type is as the target variable Select History of ischemic stroke is History of ischemic stroke is1 and History of ischemic stroke is2 as variables to be transposed Select trans6 from the target variable list 236 Chapter 22 Figure 22 30 Restructure Data Wizard Variables to Cases Select Variables step Variables to Cases Select Variables For each variable group you have in the current data the restructured file will have one target variable Inthis step choose how to identify case groups in the restructured data
76. Inc solutions address interconnected business objectives across an entire organization by focusing on the convergence of analytics IT architecture and business processes Commercial government and academic customers worldwide rely on SPSS Inc technology as a competitive advantage in attracting retaining and growing customers while reducing fraud and mitigating risk SPSS Inc was acquired by IBM in October 2009 For more information visit http www spss com Technical support Technical support is available to maintenance customers Customers may contact Technical Support for assistance in using SPSS Inc products or for installation help for one of the supported hardware environments To reach Technical Support see the SPSS Inc web site at Attp support spss com or find your local office via the web site at http support spss com default asp refpage contactus asp Be prepared to identify yourself your organization and your support agreement when requesting assistance Customer Service If you have any questions concerning your shipment or account contact your local office listed on the Web site at Attp www spss com worldwide Please have your serial number ready for identification Training Seminars SPSS Inc provides both public and onsite training seminars All seminars feature hands on workshops Seminars will be offered in major cities on a regular basis For more information on these seminars contact your local off
77. Iteration Before Switching 2 Limit iterations based on change in parameter estimates Minimum Change Type T Limit iterations based on change in log likelihood pUser Missing Values 9 Treat as invalid solid ang Treat as valid 2 Check for complete separation of data points This setting applies to categorical design and Starting Iteration model variables ong j 7 Display iteration history Confidence Interval a cost Lime 73 Complex Samples Ordinal Regression Estimation Method You can select a parameter estimation method choose between Newton Raphson Fisher scoring or a hybrid method in which Fisher scoring iterations are performed before switching to the Newton Raphson method If convergence is achieved during the Fisher scoring phase of the hybrid method before the maximum number of Fisher iterations is reached the algorithm continues with the Newton Raphson method Estimation This group gives you control of various criteria used in the model estimation Maximum Iterations The maximum number of iterations the algorithm will execute Specify a non negative integer m Maximum Step Halving At each iteration the step size is reduced by a factor of 0 5 until the log likelihood increases or maximum step halving is reached Specify a positive integer Limit iterations based on change in parameter estimates When selected the algorithm stops after an iteration in which the abs
78. Level of education amp Employed employ amp Gender gender al Severity of first cri Violent first crime v 2 Date of release fro Posted bail bail amp Received rehabilitati Second arrest arre mi Severity of second Violent second crim amp Second conviction amp Date of second arr E Inclusion Selection Cumulative Samplin Inclusion Selection Cumulative Samplin Numeric Expression In T_tage Operators and Numbers Functions and Special Variables Function Description Type t_age as the name of the time dependent predictor you want to define Type In T_ age as the numeric expression gt Click Continue 224 Chapter 22 Figure 22 17 Cox Regression dialog box Predictors tab Time and Event Predictors Subgroups Model Statistics Plets __ Hypothesis Tests Save Export Options Variables amp Region region Province province amp District district amp Cty city Arrest ID arrest al Age category agecat Marital status marital all Social status social al Level of education ed amp Employed employ Gender gender Severity of first crime crime1 y amp Violent first crime violent1 A ES es Date of release from first arrest date1 Age in years age amp Posted bail bail amp Received rehabilitation rehab al Severity of second crime crime2 Time Depe
79. NAMEVARS PRINT SELECTION Printing the sampling summary in this case produces a cumbersome table that causes problems in the Output Viewer To turn off display of the sampling summary replace SELECTION with CPS in the PRINT subcommand Then run the syntax within the syntax window These selections draw a sample according to the third stage of the demo csplan sampling plan 123 Complex Samples Sampling Wizard Sample Results Figure 13 31 Data Editor with sample results city subdivision 190 946 94514 20 5 00 0 50 00 a5 190 946 94515 20 5 00 0 50 00 p 90 946 94516 20 5 00 0 50 00 20 244 44 244 44 190 946 94517 20 5 00 0 50 00 90 946 94518 20 5 00 0 50 00 190 946 94519 20 5 00 10 50 00 190 946 94520 20 5 00 10 50 00 90 946 94521 20 5 00 10 50 00 22 190 946 94522 2 500 0 50 00 i E 190 946 94523 20 5 00 10 50 00 190 946 94524 20 5 00 10 50 00 20 244 44 244 44 90 946 94525 20 5 00 0 50 00 90 946 94526 20 5 00 0 50 00 190 946 94527 20 5 00 0 50 00 190 946 94528 20 5 00 0 50 00 90 946 94529 2 500 0 50 00 201 24444 244 44 90 946 94530 20 500 10 50 00 A J z gt bata View Variable view pi o e Goa El You can see the sampling results in the Data Editor Three new variables were saved to the working file representing the inclusion pro
80. Options Output Files Completion Input Sample Weight w al incomplete section Select County as a stratification variable Select Township as a cluster variable gt Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each county In this stage townships are drawn as the primary sampling unit using the default method simple random sampling 96 Chapter 13 Figure 13 3 Sampling Wizard Sample Size step stage 1 Stage 1 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw p Welcome Variables oF Stage 1 amp Property ID propid Units counts gt i P Design Variables ghborhood nbrhood Method y a s since las O Value Sample Size Soper The size value applies Output Variables s to each stratum Summary Add Stage 2 O Unequal values for strata a _ Draw Sample Selection Options Output Files Completion be O Read values from variable Minimum Maximum Count Count Beck es gt rien conce re Select Counts from the Units drop down list Type 4 as the value for the number of units to select in this stage Click Next and then click
81. Preparation option stroke_invalid sav This hypothetical data file contains the initial state of a medical database and contains several data entry errors stroke_survival This hypothetical data file concerns survival times for patients exiting a rehabilitation program post ischemic stroke face a number of challenges Post stroke the occurrence of myocardial infarction ischemic stroke or hemorrhagic stroke is noted and the time of the event recorded The sample is left truncated because it only includes patients who survived through the end of the rehabilitation program administered post stroke stroke_valid sav This hypothetical data file contains the state of a medical database after the values have been checked using the Validate Data procedure It still contains potentially anomalous cases survey_sample sav This data file contains survey data including demographic data and various attitude measures It is based on a subset of variables from the 1998 NORC General Social Survey although some data values have been modified and additional fictitious variables have been added for demonstration purposes telco sav This is a hypothetical data file that concerns a telecommunications company s efforts to reduce churn in their customer base Each case corresponds to a separate customer and records various demographic and service usage information telco_extra sav This data file is similar to the telco sav data file but the tenure a
82. Public Data The National Health Interview Survey NHIS is a large population based survey of the U S civilian population Interviews are carried out face to face in a nationally representative sample of households Demographic information and observations about health behavior and status are obtained for members of each household A subset of the 2000 survey is collected in nhis2000_subset sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use the Complex Samples Analysis Preparation Wizard to create an analysis plan for this data file so that it can be processed by Complex Samples analysis procedures Using the Wizard gt To prepare a sample using the Complex Samples Analysis Preparation Wizard from the menus choose Analyze gt Complex Samples gt Prepare for Analysis Copyright SPSS Inc 1989 2010 140 141 Complex Samples Analysis Preparation Wizard Figure 14 1 Analysis Preparation Wizard Welcome step Welcome to the Analysis Preparation Wizard The Analysis Preparation Wizard helps you describe your complex sample and choose an estimation method You will be asked to provide sample weights and other information needed for accurate estimation of standard errors Your selections will be saved to a plan file that you can use in any of the analysis procedures in the Complex Samples Option What would you like to do Create a plan file Choose this option if y
83. Samples General Linear Model 50 expected values in Complex Samples Crosstabs 39 F statistic in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 Fisher scoring in Complex Samples Ordinal Regression 72 generalized cumulative model in Complex Samples Ordinal Regression 204 Helmert contrasts in Complex Samples General Linear Model 50 inclusion probabilities in Sampling Wizard 12 input sample weights in Sampling Wizard 6 iteration history in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 273 iterations in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 least significant difference in Complex Samples 49 59 69 in Complex Samples Cox Regression 85 legal notices 267 likelihood convergence in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 log minus log plot in Complex Samples Cox Regression 257 marginal means in GLM Univariate 183 martingale residuals in Complex Samples Cox Regression 86 mean in Complex Samples Descriptives 34 163 measure of size in Sampling Wizard 8 missing values in Complex Samples 31 40 in Complex Samples Descriptives 35 in Complex Samples General Linear Model 52 in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 in Complex Samples Ratios 44 Murthy s sampling method in Sampling Wizard 8 Newton Raphson method in Com
84. Sampling Method nonoa nananana aaa 8 Sampling Wizard Sample Size oooococococococ eee eee 10 Define Unequal SiZBS oooococococ 11 Sampling Wizard Output Variables 0000 000 cece eens 12 Sampling Wizard Plan Summary 0 00 0 cece cece eee eee eens 13 Sampling Wizard Draw Sample Selection Options 0 000 00 cece eee 14 Sampling Wizard Draw Sample Output Files 00000 c cece eee 15 Sampling Wizard Finish 0 0 0 0 cc cect cette eee 16 Modifying an Existing Sample Plan ooocococcccococoo 16 Sampling Wizard Plan Summary 00 0 cece eee eect eee eae 17 Running an Existing Sample Plan 0 000 c cee eee eee eee 18 CSPLAN and CSSELECT Commands Additional Features 0 0020 cece ec eeeaee 18 3 Preparing a Complex Sample for Analysis 19 Creating a New Analysis Plan 0 0 00 cece eee eens 20 Analysis Preparation Wizard Design Variables 0 0 00 cc ccc cee eee ene es 20 Tree Controls for Navigating the Analysis Wizard 0 0 0000 ce cece eee ee 21 Analysis Preparation Wizard Estimation Method Analysis Preparation Wizard Size Define Unequal Sizes Analysis Preparation Wizard Plan Summary Analysis Preparation Wizard Finish Modifying an Existing Analysis Plan Analysis Preparation Wizard Plan Summary Complex Samples Plan Complex Samples Frequencies Complex Samples Frequencies Statist
85. Variables incomplete section In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Variables Stratify By Clusters ca Gee a gt Select Subdivision as a stratification variable Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each subdivision In this stage household units are drawn as the primary sampling unit using the default method simple random sampling 114 Chapter 13 Figure 13 22 Sampling Wizard Sample Size step stage 3 Stage 3 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw A Welcome Variables Stage 1 gt Design Variables scale Value Sample Size The size value applies Output Variables to each stratum Summary Unequal values for strata 3 f Stage 2 i P Design Variables Define Method Sample Size Output Variables Summary a _ Stage 3 Minimum Maximum i Design Variables Count C Count O Method gt Sample Size Output Variables
86. You can provide a size that is fixed across strata or specify sizes on a per stratum basis O Value O Unequal values for strata O Read values from variable gt Select Read values from variable and select inclprob_s2 as the variable containing the second stage inclusion probabilities 154 Chapter 14 Summary Figure 14 14 Summary table Stage 1 Design Cluster Wariables Branch Analysis Estimator Assumption Information Equal probability sampling without replacement Inclusion Probability Obtained from variable inclprob_s1 Plan File c bankloan csaplan Weight Variable finalweight SRS Estimator Sampling without replacement Equal probability sampling without replacement Obtained from variable inclprob_ s2 The summary table reviews your analysis plan The plan consists of two stages with a design of one cluster variable Equal probability without replacement WOR estimation is used and the plan is saved to c bankloan csaplan You can now use this plan file to process bankloan_noweights sav with the inclusion probabilities and sampling weights you ve computed with Complex Samples analysis procedures Related Procedures The Complex Samples Analysis Preparation Wizard procedure is a useful tool for readying a sample for analysis when you do not have access to the sampling plan file m To create a sampling plan file and draw a sample use the Sampling Wizard Chapter Complex S
87. a cluster variable Sample Weight You must provide sample weights in the first stage Sample weights are computed automatically for subsequent stages of the current design Stage Label You can specify an optional string label for each stage This is used in the output to help identify stagewise information Note The source variable list has the same contents across steps of the Wizard In other words variables removed from the source list in a particular step are removed from the list in all steps Variables returned to the source list show up in all steps Tree Controls for Navigating the Analysis Wizard At the left side of each step of the Analysis Wizard is an outline of all the steps You can navigate the Wizard by clicking on the name of an enabled step in the outline Steps are enabled as long as all previous steps are valid that is as long as each previous step has been given the minimum required specifications for that step For more information on why a given step may be invalid see the Help for individual steps 22 Chapter 3 Analysis Preparation Wizard Estimation Method Figure 3 3 Analysis Preparation Wizard Estimation Method step E Analysis Preparation Wizard Stage 1 Estimation Method In this panel you select a method for estimating standard errors The estimation method depends on assumptions about how the sample was drawn Welcome E Dd Stage 1 Which of the following sample designs should be
88. ack transform to original units O Compute based on original units of survival function Limit iterations based on change in log likelihood Minimum Change Type Relative Display iteration history User Missing Values Increment 1 O Treat as invalid Tie breaking method for parameter estimation O Treat as valid O Efron This setting applies to all categorical model and sample design variables O Breslow Confidence interval Select Breslow as the tie breaking method in the Estimation group Click OK Sample Design Information Figure 22 49 Sample design information a ee Subjects Cases Invalid Cases o 3310 Total Cases 3310 Unweighted Counts Population Subject Size 2421 000 nae Strata 1 9 Units 2421 Sampling Design Degrees of Freedom 2420 This table contains information on the sample design pertinent to the estimation of the model 255 Complex Samples Cox Regression m There are multiple cases for some subjects and all 3 310 cases are used in the analysis m The design has a single stratum and 2 421 units one for each subject The sampling design degrees of freedom are estimated by 2421 1 2420 Tests of Model Effects Figure 22 50 Tests of model effects Es Frac as 2000 2419 000 1064936 000 2000 2419 000 739 197 000 tas Time Variable Length of stay for rehabilitation Time to first event post attack Event Status Var
89. age With the addition of the time dependent predictor the significance value for age is 0 91 indicating that its contribution to the model is superseded by that of t_age 226 Chapter 22 Parameter Estimates Figure 22 20 Parameter estimates 95 Confidence Interval Lower Upper Parameter B Std Error age 002 0 01 030 027 tage 012 002 017 008 Survival Time Variable Time to second arrest Event Status Variable Second arrest 1 Model age t_age Design Effect 702 666 Looking at the parameter estimates and standard errors you can see that you have replicated the alternative model from the test of proportional hazards By explicitly specifying the model you can request additional parameter statistics and plots Here we have requested the design effect the value for t_age of less than 1 indicates that the standard error for t_age is smaller than what you would obtain if you assumed that the dataset was a simple random sample In this case the effect of t_age would still be statistically significant but the confidence intervals would be wider Multiple Cases per Subject in Complex Samples Cox Regression Researchers investigating survival times for patients exiting a rehabilitation program post ischemic stroke face a number of challenges Multiple cases per subject Variables representing patient medical history should be useful as predictors Over time patients
90. amples Frequencies The Complex Samples Frequencies procedure produces frequency tables for selected variables and displays univariate statistics Optionally you can request statistics by subgroups defined by one or more categorical variables Using Complex Samples Frequencies to Analyze Nutritional Supplement Usage A researcher wants to study the use of nutritional supplements among U S citizens using the results of the National Health Interview Survey NHIS and a previously created analysis plan For more information see the topic Using the Complex Samples Analysis Preparation Wizard to Ready NHIS Public Data in Chapter 14 on p 140 A subset of the 2000 survey is collected in nhis2000_subset sav The analysis plan is stored in nhis2000_subset csaplan For more information see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19 Use Complex Samples Frequencies to produce statistics for nutritional supplement usage Running the Analysis gt Torun a Complex Samples Frequencies analysis from the menus choose Analyze gt Complex Samples gt Frequencies Copyright SPSS Inc 1989 2010 155 156 Chapter 15 Figure 15 1 Complex Samples Plan dialog box Plan File Inhis2000_subset csaplan f If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabi
91. ance compatibility or any other claims related to non SPSS products Questions on the capabilities of non SPSS products should be addressed to the suppliers of those products This information contains examples of data and reports used in daily business operations To illustrate them as completely as possible the examples include the names of individuals companies brands and products All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental COPYRIGHT LICENSE This information contains sample application programs in source language which illustrate programming techniques on various operating platforms You may copy modify and distribute these sample programs in any form without payment to SPSS Inc for the purposes of developing Copyright SPSS Inc 1989 2010 267 268 Appendix B using marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written These examples have not been thoroughly tested under all conditions SPSS Inc therefore cannot guarantee or imply reliability serviceability or function of these programs The sample programs are provided AS IS without warranty of any kind SPSS Inc shall not be liable for any damages arising out of your use of the sample programs Trademarks IBM the IBM logo and ibm com are
92. and choose which variables belong with each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File CAT scan result catscan Case Group Identification amp Clot dissolving drugs clotso dl Treatment result result amp Post event preventative sur Variable amp Post event rehabilitation re Length of stay for rehabilitat Total treatment and rehabilit amp First event post attack eve Target Variable E Time to first event post atta dl History of myocardial intarct all History of hemorrhagic stroke hs dl History of ischemic stroke i dll History of hemorrhagic stroke hs1 all History of hemorrhagic stro all History of hemorrhagic stroke hs2 amp Second event post attack e E Time to second event post dl History of myocardial infarct dll History of ischemic stroke i all History of hemorrhagic stro Third event post attack eve Wariables to be Transposed Fixed Variable s gt Type hs as the target variable Select History of hemorrhagic stroke hs History of hemorrhagic stroke hs1 and History of hemorrhagic stroke hs2 as variables to be transposed gt Click Next then click Next in the Create Index Variables step 237 Complex Samples Cox Regression Figure 22 31 Restructure Data Wizard Variables to Cases Create One Index Variable step Var
93. and residuals as new variables in the working file 52 Chapter 9 Export model as SPSS Statistics data Writes a dataset in IBM SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in the matrix file is as follows m rowtype_ Takes values and value labels COV Covariances CORR Correlations EST Parameter estimates SE Standard errors SIG Significance levels and DF Sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other row types m varname_ Takes values P1 P2 corresponding to an ordered list of all model parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the parameter estimates table The cells are blank for other row types m P1 P2 These variables correspond to an ordered list of all model parameters with variable labels corresponding to the parameter strings shown in the parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correlations are set to the system missing value all parameter estimates are set at zero and all standard errors significance levels and residual degrees of freedom are set to the system missing value Note This fil
94. and select a complex sample Your selections will be saved to a plan file that you can use at analysis time to indicate how the data were sampled You can also use the wizard to modify a sampling plan or draw a sample according to an existing plan What would you like to do 9 Design a sample A Choose this option if you have not created a plan pjg Growse s file You will have the option to draw the sample ppoll cspl O Edit a sample design Choose this option if you want to add remove or modify stages of an existing plan You will File Browse a ay have the option to draw the sample O Draw the sample Choose this option if you already have a plan file File and want to draw a sample A gt Select Design a sample browse to where you want to save the file and type poll csplan as the name of the plan file gt Click Next 125 Complex Samples Sampling Wizard Figure 13 33 Sampling Wizard Design Variables step stage 1 Stage 1 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables Ed stage 1 Voter ID voteia Design Variables amp Neighborhood nbrhood Method Sample Size Output Variables Summary Add Stage 2 Dr i Selection Options Out
95. andom sampling assumption Equal WOR equal probability sampling without replacement The next panel will ask you to specify inclusion probabilities or population sizes O Unequal WOR unequal probability sampling without replacement Joint probabilities will be required to analyze sample data This option is available in stage 1 only incomplete section see Bests conce eres gt Select Equal WOR as the first stage estimation method gt Click Next 150 Chapter 14 Figure 14 10 Analysis Preparation Wizard Size step stage 1 Stage 1 Size Inthis panel you specify inclusion probabilities or population sizes for the current stage You can provide a size that is fixed across strata or specify sizes on a per stratum basis i Welcome Variables e Stage 1 Number of customers ncust Units inclusion Probabilties E Design Variables amp Customer ID customer Estimation Method E Age in years age O Value Size i Level of education ed Summary Years with current employer Add Stage 2 Years at current address ad Unequal values for strata Completion E Household income in thousan 8E Debt to income ratio x100 d Credit card debt in thousands Read values from variable Other debt in thousands oth amp Previously defaulted default inclprob_s2 gt Select Read values from variable and select inclprob_s1 as the variable containing the first
96. ases are saved along with the variables if the destination is a new dataset or file Joint probabilities are saved if you request PPS sampling without replacement They are needed for WOR estimation of PPS designs Where do you want to save sample data O Active dataset 9 New dataset poll_cs_sample External file Where do you want to save joint probabilities Save case selection rules File gt Choose to save the sample to a new dataset and type poll_cs_sample as the name of the dataset gt Browse to where you want to save the joint probabilities and type poll_jointprob sav as the name of the joint probabilities file gt Click Next 134 Chapter 13 Figure 13 42 Sampling Wizard Finish step E Sampling Wizard Completing the Sampling Wizard You have provided all of the information needed to create a sample design and draw a sample You can return to the Sampling Wizard later if you need to add or modify stages After all the stages have been sampled you can use the plan file in any Complex Samples analysis procedure to indicate how the sample was drawn Welcome J Stage 1 gt Design Variables What do you want to do Method Save the design to a plan file and draw the sample Sample Size Output Variables h Summary a Stage 2 O Paste the syntax generated by the Wizard into a syntax window Design Variables Method Sample Size Output Variables Summary Add Stage
97. ate Error Value at last appraisal 1 364 1 227 1502 Value at last appraisal 1 277 1 208 1 346 068 k k 064 k j 152 032 P a The default display of the table is very wide so you will need to pivot it for a better view Pivoting the Ratios Table gt Double click the table to activate it gt From the Viewer menus choose Pivot gt Pivoting Trays Drag Numerator and then Denominator from the row to the layer Drag County from the row to the column Drag Statistics from the column to the row vY v v Yy Close the pivoting trays window Pivoted Ratios Table Figure 18 5 Pivoted ratios table Numerator Current value Denominator Value at last appraisal Eastern Central Western Northern Southern Ratio Estimate 1 381 1 364 1 524 1 277 1195 Standard Error 068 064 053 032 029 95 Confidence Lower 1 236 1 227 1 410 1 208 1 134 Interval Upper 1 525 1 502 1 638 1 346 1 256 Hypothesis Test Test Yalue 13 13 13 13 13 t 1 191 997 4 201 702 3 646 df 15 15 15 15 15 252 334 001 493 002 Unweighted Count 168 179 202 205 220 The ratios table is now pivoted so that statistics are easier to compare across counties m The ratio estimates range from a low of 1 195 in the Southern county to a high of 1 524 in the Western county m There is also quite a bit of variability in the standard errors which range from a low of 0 029 in the Southern county to 0 068 in the Eastern county 175 Complex Samples Rati
98. aunt brother cousin daughter father granddaughter grandfather grandmother grandson mother nephew niece sister son uncle They asked four groups of college students two female two male to sort these terms on the basis of similarities Two groups one female one male were asked to sort twice with the second sorting based on a different criterion from the first sort Thus a total of six sources were obtained Each source corresponds to a 15 x 15 proximity matrix whose cells are equal to the number of people in a source minus the number of times the objects were partitioned together in that source kinship_ini sav This data file contains an initial configuration for a three dimensional solution for kinship_dat sav kinship_var sav This data file contains independent variables gender gener ation and degree of separation that can be used to interpret the dimensions of a solution for kinship_dat sav Specifically they can be used to restrict the space of the solution to a linear combination of these variables marketvalues sav This data file concerns home sales in a new housing development in Algonquin Ill during the years from 1999 2000 These sales are a matter of public record 263 Sample Files nhis2000_subset sav The National Health Interview Survey NHIS is a large population based survey of the U S civilian population Interviews are carried out face to face in a nationally representative sample of
99. b sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use Complex Samples Cox Regression to assess the validity of the proportional hazards assumption and fit a model with time dependent predictors if appropriate Preparing the Data The dataset contains the dates of release from first arrest and second arrest since Cox regression analyzes survival times you need to compute the amount of time between these dates However Date of second arrest date2 contains cases with the value 10 03 1582 a missing value for date variables These are people who have not had a second offense and we definitely want to include them as right censored cases in the model The end of the follow up period was June 30 2006 so we are going to recode 10 03 1582 to 06 30 2006 gt To recode these values from the menus choose Transform gt Compute Variable Copyright SPSS Inc 1989 2010 210 21 Figure 22 1 Compute Variable dialog box Target Variable amp Province province amp District district amp City city amp Arrest ID arrest Age in years age al Age category agec amp Marital status marital al Social status social di Level of education amp Employed employ amp Gender gender dl Severity of first eri amp Violent first crime v Numeric Expression DATE DMY 30 6 2006 DATE DMY day month year Numeric Returns a date val
100. babilities and cumulative sampling weights for the third stage plus the final sampling weights These new weights take into account the weights computed during the sampling of the first two stages m Units with values for these variables were selected to the sample m Units with system missing values for these variables were not selected The company will now use its resources to obtain survey information for the housing units selected in the sample Once the surveys are collected you can process the sample with Complex Samples analysis procedures using the sampling plan demo csplan to provide the sampling specifications Sampling with Probability Proportional to Size PPS Representatives considering a bill before the legislature are interested in whether there is public support for the bill and how support for the bill is related to voter demographics Pollsters design and conduct interviews according to a complex sampling design A list of registered voters is collected in poll_cs sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use the Complex Samples Sampling Wizard to select a sample for further analysis Using the Wizard To run the Complex Samples Sampling Wizard from the menus choose Analyze gt Complex Samples gt Select a Sample 124 Chapter 13 Figure 13 32 Sampling Wizard Welcome step Welcome to the Sampling Wizard The Sampling Wizard helps you design
101. can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample Welcome Summary A Stage 1 Stage Label Strata Clusters Size Method Design Variables 4 None region province 3 per stratum Simple Random Sampling Method WOR Sample Size Output Variables gt Summary Add Stage 2 Draw Sample File c tempidemo csplan Selection Options Output Files Do you want to add stage 2 Completion 9 Yes add stage 2 now O No do not add another stage now Choose this option if the working data Choose this option if stage 2 data are not file contains data for stage 2 available yet or your design has only one stage Sea sea Bests oo cone ste gt Select Yes add stage 2 now gt Click Next 110 Chapter 13 Figure 13 18 Sampling Wizard Design Variables step stage 2 E Sampling Wizard E Stage 2 Design Variables In this panel you can stratify your sample or define clusters You can also provide a label for the stage that will be used in the output If sampling weights exist from a prior stage of the sample design you can use them as input to the current stage Welcome Variables Stratify By oe Subdivision subdivision amp District district Design Variables Method Sample Size gt Output Variables i Summary a Stage 2 4 Design Variables Clusters Method Sample Size Output Variables Dray mple Add Selection Opt
102. cates that there are more cases in the cell than there would be if the row and column variables were independent m Adjusted residuals The residual for a cell observed minus expected value divided by an estimate of its standard error The resulting standardized residual is expressed in standard deviation units above or below the mean 40 Chapter 7 Summaries for 2 by 2 Tables This group produces statistics for tables in which the row and column variable each have two categories Each is a measure of the strength of the association between the presence of a factor and the occurrence of an event m Odds ratio The odds ratio can be used as an estimate of relative risk when the occurrence of the factor is rare m Relative risk The ratio of the risk of an event in the presence of the factor to the risk of the event in the absence of the factor m Risk difference The difference between the risk of an event in the presence of the factor and the risk of the event in the absence of the factor Test of independence of rows and columns This produces chi square and likelihood ratio tests of the hypothesis that a row and column variable are independent Separate tests are performed for each pair of variables Complex Samples Missing Values Figure 7 3 Missing Values dialog box E Complex Samples Crosstabs Missing Values Tables O Use all available data table by table deletion O Use consistent case base listwise deletion
103. ccording to a complex design however the sampling weights are not included in the file This information is contained in bankloan_cs_noweights sav For more information see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19 Starting with what she knows about the sampling design the officer wants to use the Complex Samples Analysis Preparation Wizard to create an analysis plan for this data file so that it can be processed by Complex Samples analysis procedures The loan officer knows that the records were selected in two stages with 15 out of 100 bank branches selected with equal probability and without replacement in the first stage One hundred customers were then selected from each of those banks with equal probability and without replacement in the second stage and information on the number of customers at each bank is included in the data file The first step to creating an analysis plan is to compute the stagewise inclusion probabilities and final sampling weights Computing Inclusion Probabilities and Sampling Weights gt To compute the inclusion probabilities for the first stage from the menus choose Transform gt Compute Variable 144 Chapter 14 Figure 14 4 Compute Variable dialog box Target Variable Numeric Expression inclprob_s1 015 Function group Customer ID customer A All 8 Age in years age i al Level of education ed Arithmetic CDF amp Noncentral CDF 8 Years with cur
104. ck event2 Fixed Variable s E EN In the Case Group Identification group select Use selected variable and select Patient ID patid as the subject identifier Type event as the first target variable Select First event post attack event1 Second event post attack event2 and Third event post attack event3 as variables to be transposed Select trans2 from the target variable list 232 Chapter 22 Figure 22 26 Restructure Data Wizard Variables to Cases Select Variables step ructure Data Wizard Variables to Cases Select Variables For each variable group you have in the current data the restructured file will have one target variable In this step choose how to identify case groups in the restructured data and choose which variables belong with each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File amp Post event preventative sur Case Group Identification amp Post event rehabilitation re Length of stay for rehabilitat E Total treatment and rehabilit S Variable amp First event post attack eve E Time to first event post atta dl History of myocardial infarct dl History of ischemic stroke i Target Variable dll History of hemorrhagic stro Second event post attack e E Time to second event post al History of myocardial infarct dl History of ischemic
105. contrasts are different from the null hypothesis values m Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level m Sidak This method provides tighter bounds than the Bonferroni approach Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested 60 Chapter 10 Complex Samples Logistic Regression Odds Ratios Figure 10 6 Logistic Regression Odds Ratios dialog box E Complex Samples Logistic Regression Odds Ratios E Factors Odds Ratios for Comparing Factor Levels all Level of education ed e Factor Reference Categ Covariates Odds Ratios for Change in Covariate Values 8 Household income i 2 Covariate Units of Change E Debt to income ratio 4 di Debt to income rati 1 One set of odds ratios is produced for each variable in the Odds Ratios grids For each set all other factors in the model are evaluated at their highest levels all other covariates are evaluated at their means 65 5 Coe The Odds Ratios dialog box allows you to display the model estimated odds ratios
106. cording to complex designs possibly analyzing the sample later The primary tool for surveyors is the Sampling Wizard m Analyze sample data files previously obtained according to complex designs Before using the Complex Samples analysis procedures you may need to use the Analysis Preparation Wizard Regardless of which type of user you are you need to supply design information to Complex Samples procedures This information is stored in a plan file for easy reuse Plan Files A plan file contains complex sample specifications There are two types of plan files Sampling plan The specifications given in the Sampling Wizard define a sample design that is used to draw a complex sample The sampling plan file contains those specifications The sampling plan file also contains a default analysis plan that uses estimation methods suitable for the specified sample design Analysis plan This plan file contains information needed by Complex Samples analysis procedures to properly compute variance estimates for a complex sample The plan includes the sample structure estimation methods for each stage and references to required variables such as sample weights The Analysis Preparation Wizard allows you to create and edit analysis plans There are several advantages to saving your specifications in a plan file including m A surveyor can specify the first stage of a multistage sampling plan and draw first stage units now collect information
107. cts model using the factors and covariates specified in the main dialog box Alternatively you can build a custom model that includes interaction effects and nested terms Non Nested Terms For the selected factors and covariates Interaction Creates the highest level interaction term for all selected variables Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables 48 Chapter 9 Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Th
108. cumulative sample weight over stages previous to and including the current one The rootname for the saved variable is SampleWeightCumulative_ 13 Sampling from a Complex Design Index Identifies units selected multiple times within a given stage The rootname for the saved variable is Index _ Note Saved variable rootnames include an integer suffix that reflects the stage number for example PopulationSize_1_ for the saved population size for stage 1 Sampling Wizard Plan Summary Figure 2 7 Sampling Wizard Plan Summary step Sampling Wizard Stage 1 Plan Summary Welcome a Stage 1 Di Design Variables Method Sample Size i Output Variables i P Summary f Add Stage 2 J Draw Sample i Selection Options Output Files Completion Summary Stage Label Strata This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample clusters Size Method 1 None county town File C tempiproperty_assess csplan Do you want to add stage 2 Yes add stage 2 now Choose this option if the working data file contains data for stage 2 4 per stratum Simple Random Sampling VOR No do not add another stage now Choose this option if stage 2 data are not available yet or your design has only one stage This is the last step within each stage providing a summary of the
109. d Sample Size Output Variables i Summary Stage 2 in Design Variables Method Sample Size Output Variables i Summary J _ Stage 3 i Design Variables Method Sample Size Output Variables Summary J _ Draw Sample i P Selection Options Output Files gt Completion What do you want to do Save the design to a plan file and draw the sample O Paste the syntax generated by the Wizard into a syntax window To close this wizard click Finish cara soi E conca Cien gt Click Finish These selections produce the sampling plan file demo csplan and draw a sample according to the first two stages of that plan 118 Chapter 13 Sample Results Figure 13 26 Data Editor with sample results region province district city InclusionPr SampleWeil InclusionPr SampleWeil SampleWeil obability_1_ ghtCumulat obability_2 ghtCumulat ght_Final_ ve_1 ve 2 1 21 10 295 296 1 21 10 296 1 2 10 297 l l 1 2 10 298 20 5 00 40 50 00 50 00 239 1 2 10 299 1 2 10 300 20 5 00 10 50 00 50 00 T 2 1 301 aa 1 2 11 302 1 2 11 303 1 2 11 304 305 21 11 305 J g 306 1l 2 11 306 T o J A 1 2 11 307 20 5 00 10 50 00 50 00 1 2 11 308 4 a aal ana gi Data View A Variable View Sol l You can see the sampling results in the Data Editor Five new variables were saved to the working file representing the inclusi
110. d descriptive statistics for the dependent and independent variables are also available Data The dependent variable is categorical Factors are categorical Covariates are quantitative variables that are related to the dependent variable Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Obtaining Complex Samples Logistic Regression From the menus choose Analyze gt Complex Samples gt Logistic Regression Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 54 55 Complex Samples Logistic Regression Figure 10 1 Logistic Regression dialog box Variables Dependent Variable model amp Broren rere 8E Number of customers amp Customer ID customer Reference Category Factors Covariates Age in years age S Years with current Years at current ad Household income i Subpopulation rc Variable Category Select a dependent variable Optionally you can m Select variables for factors and covariates as appropriate for your data m Specify a variable to define a subpopulation The analysis is performed only for the selected category of the subpopulation variab
111. d explanatory ability You may still want to add other predictors to the model to further improve the fit Tests of Model Effects Figure 19 7 Tests of between subjects effects Source Corrected Model 127 231 Intercept 6321 597 shoptor 643 593 usecoup 87 453 shopfor usecoup 10 688 a Model Amount spent Intercept shopfor usecoup shopfor usecoup Each term in the model plus the model as a whole is tested for whether the value of its effect equals 0 Terms with significance values of less than 0 05 have some discernible effect Thus all model terms contribute to the model 182 Chapter 19 Parameter Estimates Figure 19 8 Parameter estimates Parameter Estimate Effect Intercept 518 249 543 592 1 387 shoptor 1 174 757 151 51 950 shoptor 2 129 443 104 70 925 shopfor 3 003 s usecoup 1 140 838 118 85 649 usecoup 2 63 026 34 520 940 usecoup 3 31 375 10 363 564 usecoup 4 0007 shoptor 1 usecoup 1 41 693 65 824 606 shopfor 1 usecoup 2 44 505 83 539 1 413 shopfor 1 usecoup 3 9 204 33 092 594 shoptor 1 usecoup 4 000 i shopfor 2 usecoup 1 89 211 112 903 533 shopfor 2 usecoup 2 54 267 86 562 836 shopfor 2 usecoup 3 17 884 47 595 797 shopfor 2 usecoup 4 003 shopfor 3 usecoup 1 0007 shopfor 3 usecoup 2 0004 shopfor 3 usecoup 3 0007 shopfor 3 usecoup 4 000 95 Confidence Interval Design
112. d for patients with one or no prior mi s is distinguishable from the hazard for patients with two prior mi s which in turn is distinguishable from the hazard for patients with three prior mi s Similar relationships hold for the levels of is and hs where increasing the number of prior incidents increases the hazard of death Pattern Values Figure 22 52 Pattern values Surviva Time Interval Start History of myocardial infarction History of ischemic stroke History of hemorrhagic stroke Reference Pattern Three Three Two Pattern 1 1 Pattern 1 2 Pattern 1 3 Pattern 1 4 Unspecified predictor is assigned the value of this predictor at the reference pattern None Three Each Survival Time Interval is defined as Start lt Survival Time lt End Model mi is hs a Unbounded None The pattern values table lists the values that define each predictor pattern In addition to the predictors in the model the start and end times for the survival interval are displayed For analyses run from the dialogs the start and end times will always be 0 and unbounded respectively through syntax you can specify piecewise constant predictor paths m The reference pattern is set at the reference category for each factor and the mean value of each covariate there are no covariates in this model For this dataset the combination of factors shown for the reference model cannot occu
113. d in the Plan dialog box when you are analyzing the sample obtained according to that plan m The Complex Samples Analysis Preparation Wizard is used to specify analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples Logistic Regression procedure allows you to model a categorical response m The Complex Samples Ordinal Regression procedure allows you to model an ordinal response Chapter Complex Samples Logistic Regression The Complex Samples Logistic Regression procedure performs logistic regression analysis on a binary or multinomial dependent variable for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Using Complex Samples Logistic Regression to Assess Credit Risk If you are a loan officer at a bank you want to be able to identify characteristics that are indicative of people who are likely to default on loans and then use those characteristics to identify good and bad credit risks Suppose that a loan officer has collected past records of customers given loans at several different branches according to a complex design This information is contained in bankloan_cs sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 The officer wants to see if the probabili
114. demographic and vehicle purchase price data tree_textdata sav A simple data file with only two variables intended primarily to show the default state of variables prior to assignment of measurement level and value labels 266 Appendix A tv survey sav This is a hypothetical data file that concerns a survey conducted by a TV studio that is considering whether to extend the run of a successful program 906 respondents were asked whether they would watch the program under various conditions Each row represents a separate respondent each column is a separate condition ulcer_recurrence sav This file contains partial information from a study designed to compare the efficacy of two therapies for preventing the recurrence of ulcers It provides a good example of interval censored data and has been presented and analyzed elsewhere Collett 2003 ulcer_recurrence_recoded sav This file reorganizes the information in ulcer_recurrence sav to allow you model the event probability for each interval of the study rather than simply the end of study event probability It has been presented and analyzed elsewhere Collett et al 2003 verd1985 sav This data file concerns a survey Verdegaal 1985 The responses of 15 subjects to 8 variables were recorded The variables of interest are divided into three sets Set 1 includes age and marital set 2 includes pet and news and set 3 includes music and live Pet is scaled as multiple nominal and a
115. descriptive statistics for activity levels Running the Analysis To run a Complex Samples Descriptives analysis from the menus choose Analyze gt Complex Samples gt Descriptives O Copyright SPSS Inc 1989 2010 160 161 Complex Samples Descriptives Figure 16 1 Complex Samples Plan dialog box Plan File Inhis2000_subset csaplan f If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default file nhis2000_subset sav O An open dataset nhis2000_subset sav SDataSet O Custom file File Browse to and select nhis2000_subset csaplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Click Continue 162 Chapter 16 Figure 16 2 Descriptives dialog box Variables Measures Stratum for variance e PSU for variance estim Weight Final Annual Sex SEX Age ACE P Region REGION Smoking frequency S E Witamin mineral supplm E Take any multi vitamins Take herbal supplemen Desirable Body Weight 8 Daily activities moving Daily activities lifting o Subpopulations Age category age_cat
116. dinal Regression Save dialog box El Complex Samples Ordinal Regression Save pSave Variables IM Predicted category Name PredictedValue Fi Probability of predicted category Name Probability of observed category Name IM Cumulative probabilities one variable per category Root Name IM Predicted probabilities one variable per category Root Name PredictedProbabilty Replace existing variables that have the same name or root name Export Model T Export model as data E Export model as XML Save Variables This group allows you to save the model predicted category probability of predicted category probability of observed category cumulative probabilities and predicted probabilities as new variables in the active dataset Export model as SPSS Statistics data Writes a dataset in IBM SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in the matrix file is as follows 72 Chapter 11 Complex Samples Ordinal Regression Options rowtype_ Takes values and value labels COV Covariances CORR Correlations EST Parameter estimates SE Standard errors SIG Significance levels and DF Sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other r
117. ditional stages Any sample stages Summary after the current stage will be ignored when the data are analyzed Design Variables gt Estimation Method Size Summary The next panel will ask you to specify inclusion probabilities or population sizes 9 Equal WOR equal probability sampling without replacement Stage 3 Completion Unequal WOR unequal probability sampling without replacement Joint probabilities will be required to analyze sample data This option is available in stage 1 only incomplete section A sea Besta conce ono gt Select Equal WOR as the second stage estimation method gt Click Next 153 Figure 14 13 Analysis Preparation Wizard Size step stage 2 Stage 1 Size Welcome Stagel i Design Variables Estimation Method Size i Summary Stage 2 Design Variables Estimation Method gt Size Summary Add Stage 3 Completion gt Click Finish Variables E Number of customers ncust amp Customer ID customer Age in years age dll Level of education ed Years with current employer 8 Years at current address ad E Household income in thousan E Debt to income ratio x100 d Credit card debt in thousands E Other debt in thousands oth amp Previously defaulted default Complex Samples Analysis Preparation Wizard In this panel you specify inclusion probabilities or population sizes for the current stage
118. dual approximates the change in the value of the parameter estimate when the case is removed from the model Cases with relatively large DFBeta residuals may be exerting undue influence on the analysis A separate variable is saved for each nonredundant parameter in the model Aggregated residuals When multiple cases represent a single subject the aggregated residual for a subject is simply the sum of the corresponding case residuals over all cases belonging to the same subject For Schoenfeld s residual the aggregated version is the same as that of the non aggregated version because Schoenfeld s residual is only defined for uncensored cases These residuals are only available when a subject identifier is specified on the Time and Event tab 88 Chapter 12 Names of Saved Variables Automatic name generation ensures that you keep all your work Custom names allow you to discard replace results from previous runs without first deleting the saved variables in the Data Editor Export Figure 12 11 Cox Regression dialog box Export tab E Complex Samples Cox Regression E Export model as SPSS Statistics data Export survival function as SPSS Statistics data Destination New dataset Name External file Destination O New dataset Name External file File Eees vival function is evaluated at predictor values specified on the Plots tab Contents Parameter estimat
119. e The null hypothesis for each test is that the value of the coefficient is 0 m Covariances of parameter estimates Displays an estimate of the covariance matrix for the model coefficients Correlations of parameter estimates Displays an estimate of the correlation matrix for the model coefficients 69 Complex Samples Ordinal Regression m Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects m Square root of design effect This is a measure expressed in units comparable to those of the standard error of the effect of specifying a complex design where values further from 1 indicate greater effects Parallel Lines This group allows you to request statistics associated with a model with nonparallel lines where a separate regression line is fitted for each response category except the last m Wald test Produces a test of the null hypothesis that regression parameters are equal for all cumulative responses The model with nonparallel lines is estimated and the Wald test of equal parameters is applied m Parameter estimates Displays estimates of the coefficients and standard errors for the model with nonparallel lines m Covariances of parameter estimates Displays an estimate of the covariance matrix for the coefficients
120. e and without replacement A cluster variable must be specified to use this method m PPS Murthy This is a first stage method that selects two clusters from each stratum with probability proportional to cluster size and without replacement A cluster variable must be specified to use this method m PPS Sampford This is a first stage method that selects more than two clusters from each stratum with probability proportional to cluster size and without replacement It is an extension of Brewer s method A cluster variable must be specified to use this method m Use WR estimation for analysis By default an estimation method is specified in the plan file that is consistent with the selected sampling method This allows you to use with replacement estimation even if the sampling method implies WOR estimation This option is available only in stage 1 Measure of Size MOS If a PPS method is selected you must specify a measure of size that defines the size of each unit These sizes can be explicitly defined in a variable or they can be computed from the data Optionally you can set lower and upper bounds on the MOS overriding any values found in the MOS variable or computed from the data These options are available only in stage 1 10 Chapter 2 Sampling Wizard Sample Size Figure 2 4 Sampling Wizard Sample Size step EH Sampling Wizard Stage 1 Sample Size In this panel you can specify the number or proportion of units t
121. e evaluated at their highest levels all other covariates are evaluated at their means Ges oa Cue The Odds Ratios dialog box allows you to display the model estimated cumulative odds ratios for specified factors and covariates This feature is only available for models using the Logit link function A single cumulative odds ratio is computed for all categories of the dependent variable except the last the proportional odds model postulates that they are all equal 71 Complex Samples Ordinal Regression Factors For each selected factor displays the ratio of the cumulative odds at each category of the factor to the odds at the specified reference category Covariates For each selected covariate displays the ratio of the cumulative odds at the covariate s mean value plus the specified units of change to the odds at the mean When computing odds ratios for a factor or covariate the procedure fixes all other factors at their highest levels and all other covariates at their means If a factor or covariate interacts with other predictors in the model then the odds ratios depend not only on the change in the specified variable but also on the values of the variables with which it interacts If a specified covariate interacts with itself in the model for example age age then the odds ratios depend on both the change in the covariate and the value of the covariate Complex Samples Ordinal Regression Save Figure 11 7 Or
122. e evidence that the risk estimates may not be constant across Income category so you may be able to increase your response rate even more by targeting lower income newspaper subscribers Related Procedures The Complex Samples Crosstabs procedure is a useful tool for obtaining descriptive statistics of the crosstabulation of categorical variables for observations obtained via a complex sampling design m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when you are analyzing the sample obtained according to that plan m The Complex Samples Analysis Preparation Wizard is used to set analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples Frequencies procedure provides univariate descriptive statistics of categorical variables Chapter Complex Samples Ratios The Complex Samples Ratios procedure displays univariate summary statistics for ratios of variables Optionally you can request statistics by subgroups defined by one or more categorical variables Using Complex Samples Ratios to Aid Property Value Assessment A state agency is charged with ensuring that prope
123. e is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Export Model as XML Saves the parameter estimates and the parameter covariance matrix if selected in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes Complex Samples General Linear Model Options Figure 9 7 General Linear Model Options dialog box E Complex Samples General Linear Model Options pUser Missing Values O Treat as invalid Treat as valid This setting applies to factors subpopulation variables and categorical design variables Confidence Interval User Missing Values All design variables as well as the dependent variable and any covariates must have valid data Cases with invalid data for any of these variables are deleted from the analysis These controls allow you to decide whether user missing values are treated as valid among the strata cluster subpopulation and factor variables Confidence Interval This is the confidence interval level for coefficient estimates and estimated marginal means Specify a value greater than or equal to 50 and less than 100 53 Complex Samples General Linear Model CSGLM Command Additional Features The command syntax language also allows you to m Specify custom tests of effects versus a linear combination of ef
124. e mailing is 17 2 82 8 0 208 Likewise the estimate of the odds that a nonsubscriber responds is 10 3 89 7 0 115 The estimate of the odds ratio is therefore 0 208 0 115 1 812 note there is some rounding error in the intervening steps The odds ratio is also the ratio of the relative risk of responding to the relative risk of not responding or 1 673 0 923 1 812 Odds Ratio versus Relative Risk Since it is a ratio of ratios the odds ratio is very difficult to interpret The relative risk is easier to interpret so the odds ratio alone is not very helpful However there are certain commonly occurring situations in which the estimate of the relative risk is not very good and the odds ratio can be used to approximate the relative risk of the event of interest The odds ratio should be used as an approximation of the relative risk of the event of interest when both of the following conditions are met m The probability of the event of interest is small lt 0 1 This condition guarantees that the odds ratio will make a good approximation to the relative risk In this example the event of interest is a response to the mailing m The design of the study is case control This condition signals that the usual estimate of the relative risk will likely not be good A case control study is retrospective most often used when the event of interest is unlikely or when the design of a prospective experiment is impractical or unethical Neit
125. e step halving method b Newton Raphson method was used to estimate the parameters Looking at the iteration history the changes in the parameter estimates over the last few iterations are slight enough that you re not terribly concerned about the warning message Comparing Models Figure 21 19 Pseudo R Squares for reduced model Cox and Snell Nagelkerke McFadden Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivetreq Link function Logit The R values for the reduced model are identical to those for the original model This is evidence in favor of the reduced model Figure 21 20 Classification table for reduced model Predicted Strongly Strongly Percent Observed agree Disagree disagree Correct Strongly agree 7067 567 12823 258 3183 380 2058 750 Agree 4271 234 15684 090 6100 963 6205 137 Disagree 2024 816 13157 809 5654 047 8640 746 Strongly disagree 889 869 9226 578 5889 053 15308 703 Overall Percent 121 43 1 17 6 27 3 Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat drivefreq Link function Logit The classification table somewhat complicates matters The overall classification rate of 37 0 for the reduced model is comparable to the original model which is evidence in favor of the reduced model However the reduced model shifts the predicted response of 3 8 of the voters 209
126. e study grocery_coupons sav This is a hypothetical data file that contains survey data collected by a grocery store chain interested in the purchasing habits of their customers Each customer is followed for four weeks and each case corresponds to a separate customer week and records information about where and how the customer shops including how much was spent on groceries during that week 262 Appendix A guttman sav Bell Bell 1961 presented a table to illustrate possible social groups Guttman Guttman 1968 used a portion of this table in which five variables describing such things as social interaction feelings of belonging to a group physical proximity of members and formality of the relationship were crossed with seven theoretical social groups including crowds for example people at a football game audiences for example people at a theater or classroom lecture public for example newspaper or television audiences mobs like a crowd but with much more intense interaction primary groups intimate secondary groups voluntary and the modern community loose confederation resulting from close physical proximity and a need for specialized services health_funding sav This is a hypothetical data file that contains data on health care funding amount per 100 population disease rates rate per 10 000 population and visits to health care providers rate per 10 000 population Each case represents a different c
127. e to last assessed value based on the results of a statewide survey carried out according to a complex design and with an appropriate analysis plan for the data Statistics The procedure produces ratio estimates tests standard errors confidence intervals coefficients of variation unweighted counts population sizes design effects and square roots of design effects Data Numerators and denominators should be positive valued scale variables Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Obtaining Complex Samples Ratios gt From the menus choose Analyze gt Complex Samples gt Ratios Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 42 43 Figure 8 1 Ratios dialog box gt Select at least one numerator variable and denominator variable Variables Property ID propid Neighborhood nbrhood amp Township town Years since last appra amp Inclusion Selection Pr Cumulative Sampling VV amp Cumulative Sampling W E Complex Samples Ratios E Numerators Current value currval Denominator 3 l S Value at last appraisal l Subpopulations rI Each combinati
128. eady been sampled If editing a plan you can also remove stages from the plan Previously sampled stages If an extended sampling frame is not available you will have to execute a multistage sampling design one stage at a time Select which stages have already been sampled from the drop down list Any stages that have been executed are locked they are not available in the Draw Sample Selection Options step and they cannot be altered when editing a plan 18 Chapter 2 Remove stages You can remove stages 2 and 3 from a multistage design Running an Existing Sample Plan gt From the menus choose Analyze gt Complex Samples gt Select a Sample Select Draw a sample and choose a plan file to run Click Next to continue through the Wizard Review the sampling plan in the Plan Summary step and then click Next v v v yv The individual steps containing stage information are skipped when executing a sample plan You can now go on to the Finish step at any time Optionally you can specify stages that have already been sampled CSPLAN and CSSELECT Commands Additional Features The command syntax language also allows you to m Specify custom names for output variables Control the output in the Viewer For example you can suppress the stagewise summary of the plan that is displayed if a sample is designed or modified suppress the summary of the distribution of sampled cases by strata that is shown if the sample design i
129. ect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from indicate greater effects m Square root of design effect This is a measure of the effect of specifying a complex design where values further from indicate greater effects Model Assumptions This group allows you to produce a test of the proportional hazards assumption The test compares the fitted model to an alternative model that includes time dependent predictors x _TF for each predictor x where _7F is the specified time function m Time Function Specifies the form of _7F for the alternative model For the identity function _TF T_ For the log function 7F log T_ For Kaplan Meier _7F 1 Skm T_ where Skm is the Kaplan Meier estimate of the survival function For rank _7F is the rank order of T_ among the observed end times m Parameter estimates for alternative model Displays the estimate standard error and confidence interval for each parameter in the alternative model m Covariance matrix for alternative model Displays the matrix of estimated covariances between parameters in the alternative model Baseline survival and cumulative hazard functions Displays the baseline survival function and baseline cumulative hazards function along with their standard errors Note If time dependent predictors defined
130. ed survival file on the Export tab Note that these options are not available if time dependent predictors defined on the Predictors tab are included in the model m Plot Factors at By default each factor is evaluated at its highest level Enter or select a different level if desired Alternatively you can choose to plot separate lines for each level of a single factor by selecting the check box for that factor Plot Covariates at Each covariate is evaluated at its mean Enter or select a different value if desired 85 Hypothesis Tests Figure 12 9 Cox Regression dialog box Hypothesis Tests tab Test Statistic OF Adjusted F Chi square Adjusted Chi square Sampling Degrees of Freedom 9 Based on sample design Fixed Complex Samples Cox Regression EH Complex Samples Cox Regression E Time and Event _ Predictors Subgroups Model Statistics Plots Hypothesis Tests Save Export Options Adjustment for Multiple Comparisons Least significant difference Sequential Sidak Sequential Bonferroni Sidak Bonferroni Test Statistic This group allows you to select the type of statistic used for testing hypotheses You can choose between F adjusted F chi square and adjusted chi square Sampling Degrees of Freedom This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics If based on the sampling design the val
131. ed the simple random sampling analysis plan or to the sample files directory and select srs csaplan gt Click Continue 248 Chapter 22 Figure 22 42 Cox Regression dialog box Time and Event tab _Tme an Event Predictors Subgroups Model Statistics Pots Hypothesis Tests Save Export Options Variables Survival Time Start of Interval Onset of Risk amp Gender gender amp Physically active active O Time 0 amp Obesity obesity O Varies by subject amp History of diabetes diabetes Start Variable Boo pressure fer amp Atrial fibrillation af amp Smoker smoker Cholesterol choles End Variable History of angina angina amp Prescribed nitroglycerin nitro amp Taking anti clotting drugs anticlot Event amp History of transient ischemic attack tia Status Variable Time to hospital time aS E End of Interval dll Initial Rankin score rankin0 amp CAT scan result catscan Y s indicating that event none Clot dissolving drugs clotsolv has occurred all Treatment result result Define Event amp Post event preventative surgery sur Post event rehabilitation rehab Ly Subject Identifier L Total treatment and rehabilitation cost amp Event Index event_index Choose a subject identifier variable if there are multiple cases per all History of myocardial infarction mi subject al History of ischemic stroke is ail History
132. ed when the sample is drawn The variables contain information about the sample or population for the current stage If the sample is stratified the variables contain data for each stratum A Welcome EB Stage1 de Design Variables Method Sample Size gt Output Variables Summary E Sample size E Sample weight Add Stage 2 Which variables do you want to save E Population size E Sample proportion a Draw Sample i Selection Options 4 Inclusion probabilities cumulative sample weights and final sample weights are always Output Files ENGEL Completion Duplication indexes are created automatically when the plan requests sampling with replacement This step allows you to choose variables to save when the sample is drawn Population size The estimated number of units in the population for a given stage The rootname for the saved variable is PopulationSize_ Sample proportion The sampling rate at a given stage The rootname for the saved variable is SamplingRate_ Sample size The number of units drawn at a given stage The rootname for the saved variable is SampleSize_ Sample weight The inverse of the inclusion probabilities The rootname for the saved variable is SampleWeight_ Some stagewise variables are generated automatically These include Inclusion probabilities The proportion of units drawn at a given stage The rootname for the saved variable is InclusionProbability_ Cumulative weight The
133. ed within the grid or moved to the Exclude list Enter sizes in the rightmost column Click Labels or Values to toggle the display of value labels and data values for stratification and cluster variables in the grid cells Cells that contain unlabeled values always show values Click Refresh Strata to repopulate the grid with each combination of labeled data values for variables in the grid Exclude To specify sizes for a subset of stratum cluster combinations move one or more variables to the Exclude list These variables are not used to define sample sizes 25 Preparing a Complex Sample for Analysis Analysis Preparation Wizard Plan Summary Figure 3 6 Analysis Preparation Wizard Plan Summary step Analysis Preparation Wizard Stage 1 Plan Summary This panel summarizes the plan so far You can add another stage to the plan If you choose not to add a stage the next panel is the Completion panel Welcome Summary _ Stage 1 Stage Label Strata Clusters Weights Size Method Design Variables 1 None branch finalweight Read from inclprob_s1 Equal WOR Estimation Method Size gt Summary Add Stage 2 Completion File bankloan csplan Do you want to add stage 2 Yes add stage 2 now No do not add another stage now Choose this option if the sample Choose this option if this is the last stage of the contains another stage sample This is the last step within each stage providing a summary of the analysis d
134. eens 227 Creating a Simple Random Sampling Analysis Plan 20020 cee ee cece 242 Running the Analysis 0 0 ccc eee a Ea EE a E n ee nee 246 Sample Design Information 0 000 cette eet tenes 254 Tests of Model EFFECTS ro sotadi masanna een eee nen tenn eens 255 Parameter Estimates 0 0 0 ccc ce en ene nen eee n eens 255 Pattern Values ssa coia im ce ad ow ade Se E eed woe dn ein dew ee a 256 Log Minus Log Plot 00 0 cece etna 257 SUMMANY sit icra Paw Keg mens lite Bide a eae eddie A alitas 257 xi Appendices A Sample Files B Notices Bibliography Index 258 267 269 271 Part I User s Guide Chapter Introduction to Complex Samples Procedures An inherent assumption of analytical procedures in traditional software packages is that the observations in a data file represent a simple random sample from the population of interest This assumption is untenable for an increasing number of companies and researchers who find it both cost effective and convenient to obtain samples in a more structured way The Complex Samples option allows you to select a sample according to a complex design and incorporate the design specifications into the data analysis thus ensuring that your results are valid Properties of Complex Samples A complex sample can differ from a simple random sample in many ways In a simple random sample individual sampling units are selected at random wit
135. eeste Bese cancer rer _ gt Select at least one measure variable Optionally you can specify variables to define subpopulations Statistics are computed separately for each subpopulation Complex Samples Descriptives Statistics Figure 6 2 Descriptives Statistics dialog box Summaries T Mean ttest Test value Statistics i Standard error Unweighted count Fi Confidence interval Population size Level Design effect E Coefficient of variation _ Square root of design effect Gert cancel ho 35 Complex Samples Descriptives Summaries This group allows you to request estimates of the means and sums of the measure variables Additionally you can request tests of the estimates against a specified value Statistics This group produces statistics associated with the mean or sum Standard error The standard error of the estimate Confidence interval A confidence interval for the estimate using the specified level Coefficient of variation The ratio of the standard error of the estimate to the estimate Unweighted count The number of units used to compute the estimate Population size The estimated number of units in the population Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a c
136. eger Maximum Step Halving At each iteration the step size is reduced by a factor of 0 5 until the log likelihood increases or maximum step halving is reached Specify a positive integer Limit iterations based on change in parameter estimates When selected the algorithm stops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified which must be non negative Limit iterations based on change in log likelihood When selected the algorithm stops after an iteration in which the absolute or relative change in the log likelihood function is less than the value specified which must be non negative Check for complete separation of data points When selected the algorithm performs tests to ensure that the parameter estimates have unique values Separation occurs when the procedure can produce a model that correctly classifies every case Display iteration history Displays parameter estimates and statistics at every n iterations beginning with the 0th iteration the initial estimates If you choose to print the iteration history the last iteration is always printed regardless of the value of n User Missing Values All design variables as well as the dependent variable and any covariates must have valid data Cases with invalid data for any of these variables are deleted from the analysis These controls allow you to decide whether user missing values are treated as valid among
137. egression dialog box Variables Dependent Variable amp Voter ID voteia fill The legislature s Neighborhood nbrhood amp Township town E Cumulative Sampling YV E Cumulative Sampling VV Covariates Link Function bot Subpopulation Variable rs Category gt Deselect Gender and Voted in last election as factors gt Click Options 207 Complex Samples Ordinal Regression Figure 21 16 Ordinal Regression Options dialog box E Complex Samples Ordinal Regression Options E Estimation Method Estimation Criteria O Newton Raphson Maximum Iterations 100 O Fisher scoring O Fisher scoring then Newton Raphson Maximum Step Halving Maximum Number of Iteration 4 Before Switching Fi Limit iterations based on change in parameter estimates Minimum Change 000001 Type Limit iterations based on change in log likelihood rUser Missing Values Treat as invalid Minimum Change Type Relative O Treat as valid IM Check for complete separation of data points This setting applies to categorical design and Starting Iteration model variables ong j Y Display iteration history Increment Confidence Interval a Select Display iteration history The iteration history is useful for diagnosing problems encountered by the estimation algorithm Click Continue Click OK in the Complex Samples Ordinal
138. elated cases in the new data set O Restructure selected cases into variables Use this when you have groups of related cases that you want to rearrange so that data from each group are represented as a single case in the new data set O Transpose all data All cases will become variables and selected variables will become cases in the new data set Choosing this option will end the wizard and the Transpose dialog will appear oo Go Ce Ga gt Make sure Restructure selected variables into cases is selected gt Click Next 230 Chapter 22 Figure 22 24 Restructure Data Wizard Variables to Cases Number of Variable Groups step estructure a Wizard Step 2 0 Variables to Cases Number of Variable Groups You have chosen to restructure selected variables into groups of related cases in the new file A group of related variables called a variable group represents measurements on one variable For example the variable may be width If it is recorded in three separate measurements each one representing a different point in time w1 w2 and w3 then the data are arranged in a group of variables If there is more than one variable in the file often it is also recorded in a variable group for example height recorded in h1 h2 and h3 How many variable groups do you want to restructure One for example w1 w2 and w3 H f2fst4 sis 9 More than one for example w1 w2 w3 and h1 h2 h3 etc 18 4
139. er gender amp Obesity obesity Blood pressure bp amp Atrial fibrillation af amp Smoker smoker Cholesterol choles story of hemorrha al Hospital size hosp da Attending physician al Age category agec amp Physically active a History of diabetes amp History of angina a dl History of myocardi all History of ischemic Numeric Expression time2 Le o m optional case selection condition Type start_time3 as the target variable Type time2 as the numeric expression Click OK Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic Functions and Special Variables To restructure the data from variables to cases from the menus choose Data gt Restructure 229 Complex Samples Cox Regression Figure 22 23 Restructure Data Wizard Welcome step estructure a Wizar Welcome to the Restructure Data Wizard This wizard helps you to restructure your data from multiple variables columns in a single case to groups of related cases rows or vice versa or you can choose to transpose your data The wizard replaces the current data set with the restructured data Note that data restructuring cannot be undone What do you want to do 0 Restructure selected variables into case Use this when each case in your current data has some variables that you would like to rearrange into groups of r
140. es Sampling Wizard Figure 13 8 Sampling Wizard Draw Sample Selection Options step ls Sampling Wizard E Draw Sample Selection Options In this panel you can choose whether to draw a sample You can pick which stages to extract and set other sampling options such as the seed used for random number generation Welcome J Stage 1 Do you want to draw a sample los Design Variables Method O Yes Stages maaa Sample Size Ono Output Variables te Summary What type of seed value do you want to use J Stage 2 Design Variables Oa randomly chosen number Method Custom value 241972 Enter a custom seed value if you want to reproduce the E sample later Sample Size Output Variables Summary Include in the sample frame cases with user missing values of stratification or Aad stage Working data are sorted by stratification variables presorted data may speed Draw Sample pue gt Selection Options Output Files gt Select Custom value for the type of random seed to use and type 241972 as the value Using a custom value allows you to replicate the results of this example exactly Click Next and then click Next in the Draw Sample Output Files step 102 Chapter 13 Figure 13 9 Sampling Wizard Finish step Completing the Sampling Wizard You have provided all of the information needed to create a sample design and draw a sample You can return to the Sampling Wizard later if yo
141. es and covariance matrix Parameter estimates and correlation matrix Export model as XML File Export model as SPSS Statistics data Writes a dataset in IBM SPSS Statistics format containing the parameter correlation or covariance matrix with parameter estimates standard errors significance values and degrees of freedom The order of variables in the matrix file is as follows m rowtype_ Takes values and value labels COV Covariances CORR Correlations EST Parameter estimates SE Standard errors SIG Significance levels and DF Sampling design degrees of freedom There is a separate case with row type COV or CORR for each model parameter plus a separate case for each of the other row types 89 Complex Samples Cox Regression m varname_ Takes values P1 P2 corresponding to an ordered list of all model parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the parameter estimates table The cells are blank for other row types P1 P2 These variables correspond to an ordered list of all model parameters with variable labels corresponding to the parameter strings shown in the parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correlations are set to the system missing value all parameter estimates are set at zero and all standard e
142. es or population sizes for the current stage Sizes can be fixed or can vary across strata For the purpose of specifying sizes clusters specified in previous stages can be used to define strata Note that this step is necessary only when Equal WOR is chosen as the Estimation Method Units You can specify exact population sizes or the probabilities with which units were sampled m Value A single value is applied to all strata If Population Sizes is selected as the unit metric you should enter a non negative integer If Inclusion Probabilities is selected you should enter a value between 0 and 1 inclusive Unequal values for strata Allows you to enter size values on a per stratum basis via the Define Unequal Sizes dialog box m Read values from variable Allows you to select a numeric variable that contains size values for strata 24 Chapter 3 Define Unequal Sizes Figure 3 5 Define Unequal Sizes dialog box E Define Unequal Sizes E Size Specifications Exclude Southern The Define Unequal Sizes dialog box allows you to enter sizes on a per stratum basis Size Specifications grid The grid displays the cross classifications of up to five strata or cluster variables one stratum cluster combination per row Eligible grid variables include all stratification variables from the current and previous stages and all cluster variables from previous stages Variables can be reorder
143. esign specifications through the current stage From here you can either proceed to the next stage creating it if necessary or save the analysis specifications If you cannot add another stage it is likely because m Nocluster variable was specified in the Design Variables step m You selected WR estimation in the Estimation Method step m This is the third stage of the analysis and the Wizard supports a maximum of three stages 26 Chapter 3 Analysis Preparation Wizard Finish Figure 3 7 Analysis Preparation Wizard Finish step E Analysis Preparation Wizard E Completing the Analysis Wizard You have provided all of the information needed to create a plan You can use the plan file in any Complex Samples analysis procedure when you are ready to analyze the data p Welcome dl What d nt to do E i Design Variables a de Estimation Method Save your specifications to a plan file i Summary J Stage 2 Paste the syntax generated by the Wizard into a syntax window i Design Yariables Estimation Method t Summary i P Completion To close this wizard click Finish This is the final step You can save the plan file now or paste your selections to a syntax window When making changes to stages in the existing plan file you can save the edited plan to a new file or overwrite the existing file When adding stages without making changes to existing stages the Wizard automatically overwrites the exi
144. estimated means 50 marginal means 183 model 47 model summary 181 options 52 parameter estimates 182 related procedures 185 save variables 51 statistics 48 tests of model effects 181 Complex Samples Logistic Regression 54 186 271 classification tables 191 command additional features 63 model 56 odds ratios 60 193 options 62 parameter estimates 192 272 Index pseudo R statistics 190 reference category 55 related procedures 194 save variables 61 statistics 57 tests of model effects 191 Complex Samples Ordinal Regression 64 195 classification tables 202 generalized cumulative model 204 model 66 odds ratios 70 203 options 72 parameter estimates 201 pseudo R statistics 200 208 related procedures 209 response probabilities 66 save variables 71 statistics 68 tests of model effects 200 warnings 207 Complex Samples Ratios 42 171 missing values 44 ratios 174 related procedures 175 statistics 43 Complex Samples Sampling Wizard 93 PPS sampling 123 related procedures 139 sampling frame full 93 sampling frame partial 105 summary 103 135 complex sampling analysis plan 19 sample plan 4 confidence intervals in Complex Samples Crosstabs 39 in Complex Samples Descriptives 34 163 in Complex Samples Frequencies 30 158 in Complex Samples General Linear Model 48 52 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 in
145. estimation includes the finite population correction and assumes that units are sampled with equal probability Equal WOR can be specified in any stage of a design Unequal WOR unequal probability sampling without replacement In addition to using the finite population correction Unequal WOR accounts for sampling units usually clusters selected with unequal probability This estimation method is available only in the first stage 23 Figure 3 4 Analysis Preparation Wizard Size Analysis Preparation Wizard Size step Stage 1 Size Design Variables Estimation Method P Size Summary Add Stage 2 Completion E Analysis Preparation Wizard Wariables gt Number of customers ncust amp Customer ID customer E Age in years age al Level of education ed amp Previously defaulted default inclprob_s2 E Years with current employer E Years at current address ad E Household income in thousan 8 Debt to income ratio x100 d E Credit card debt in thousands E Other debt in thousands oth Preparing a Complex Sample for Analysis In this panel you specify inclusion probabilities or population sizes for the current stage You can provide a size that is fixed across strata or specify sizes on a per stratum basis y Welcome B Stage1 o Read values from variable e Pra Becr es gt finen conce no This step is used to specify inclusion probabiliti
146. ests to provide an overall p value and these results should be comparable to the omnibus Wald test result The fact that they are so different in this example is somewhat surprising but could be due to the existence of many contrasts in the test and a relatively small design degrees of freedom 205 Complex Samples Ordinal Regression Figure 21 14 Parameter estimates for generalized cumulative model shown in part 95 Confidence The legislature should Std Interval enact a gas tax Parameter Error Strongly agree Threshold agecat 1 agecat 2 agecat 3 agecat 4 gender 0 gender 1 votelast 0 votelast 1 drivefreq 1 drivefreq 2 drivefreq 3 drivefreq 4 drivefreq 5 drivefreq 6 Threshold agecat 1 agecat 2 agecat 3 agecat 4 gender 0 gender 1 votelast 0 votelast 1 drivefreq 1 drivefreq 2 drivetreq 3 drivefreq 4 drivefreq 5 drivefreq 6 Moreover the estimated values of the generalized model coefficients don t appear to differ much from the estimates under the parallel lines assumption Dropping Non Significant Predictors The tests of model effects showed that the model coefficients for Gender and Voted in last election are not statistically significantly different from 0 gt To produce a reduced model recall the Complex Samples Ordinal Regression dialog box 206 Chapter 21 gt Click Continue in the Plan dialog box Figure 21 15 Ordinal R
147. ex Samples Cox Regression to build a model for survival times 227 Complex Samples Cox Regression Preparing the Data for Analysis Before restructuring the data you will need to create two ancillary variables to help with the restructuring To compute a new variable from the menus choose Transform gt Compute Variable Figure 22 21 Compute Variable dialog box O Variable Numeric Expression start time2 _time2 timet ID hospid _ Hospital size hosp amp Patient ID patid amp Attending physician 8 Age in years age all Age category agec Gender gender amp Physically active a amp Obesity obesity History of diabetes amp Blood pressure bp amp Atrial fibrillation af amp Smoker smoker amp Cholesterol choles amp History of angina a dl History of myocardi di History of ischemic MH History nf Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic aaa DEDE Functions and Special Variables Dn Dn case selection condition Type start_time as the target variable Type timel as the numeric expression gt Click OK 228 Chapter 22 v v v yv Figure 22 22 gt Recall the Compute Variable dialog box Compute Variable dialog box Target Variable start_time3 da Hospital ID hospid a Patient ID patid E Age in years age Cend
148. execute part of the sampling design Stages must be drawn in order that is stage 2 cannot be drawn unless stage is also drawn When editing or executing a plan you cannot resample locked stages Seed This allows you to choose a seed value for random number generation Include user missing values This determines whether user missing values are valid If so user missing values are treated as a separate category Data already sorted If your sample frame is presorted by the values of the stratification variables this option allows you to speed the selection process 15 Sampling from a Complex Design Sampling Wizard Draw Sample Output Files Figure 2 9 Sampling Wizard Draw Sample Output Files step E Sampling Wizard E Draw Sample Output Files In this panel you can choose where to save sample output data You must save sampled cases to an external file if sampling is done with replacement The selected cases are saved along with the variables if the destination is a new dataset or file Joint probabilities are saved if you request PPS sampling without replacement They are needed for WOR estimation of PPS designs Welcome Where do you want to save sample data Stage 1 i Design Variables Method O Active dataset Sample Size O New dataset Output Variables Summary Stage 2 Design Variables Method Sample Size Output Variables Summary Z Save case selection rules Add Stage 3 Draw Sample Selection Options i
149. fects or a value using the CUSTOM subcommand m Fix covariates at values other than their means when computing estimated marginal means using the EMMEANS subcommand Specify a metric for polynomial contrasts using the EMMEANS subcommand Specify a tolerance value for checking singularity using the CRITERIA subcommand Create user specified names for saved variables using the SAVE subcommand Produce a general estimable function table using the PRINT subcommand See the Command Syntax Reference for complete syntax information Chapter Complex Samples Logistic Regression The Complex Samples Logistic Regression procedure performs logistic regression analysis on a binary or multinomial dependent variable for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Example A loan officer has collected past records of customers given loans at several different branches according to a complex design While incorporating the sample design the officer wants to see if the probability with which a customer defaults is related to age employment history and amount of credit debt Statistics The procedure produces estimates exponentiated estimates standard errors confidence intervals tests design effects and square roots of design effects for model parameters as well as the correlations and covariances between parameter estimates Pseudo R statistics classification tables an
150. fference O Sequential Sidak Sequential Bonferroni O Sidak Bonferroni Test Statistic This group allows you to select the type of statistic used for testing hypotheses You can choose between F adjusted F chi square and adjusted chi square Sampling Degrees of Freedom This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics If based on the sampling design the value is the difference between the number of primary sampling units and the number of strata in the 50 Chapter 9 first stage of sampling Alternatively you can set a custom degrees of freedom by specifying a positive integer Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values m Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level m Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative
151. fficient estimate The null hypothesis for each test is that the value of the coefficient is 0 49 Complex Samples General Linear Model Covariances of parameter estimates Displays an estimate of the covariance matrix for the model coefficients Correlations of parameter estimates Displays an estimate of the correlation matrix for the model coefficients m Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from indicate greater effects m Square root of design effect This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects Model fit Displays R2 and root mean squared error statistics Population means of dependent variable and covariates Displays summary information about the dependent variable covariates and factors Sample design information Displays summary information about the sample including the unweighted count and the population size Complex Samples Hypothesis Tests Figure 9 4 Hypothesis Tests dialog box E Complex Samples Logistic Regression Hypothesis Tests Test Statistic Sampling Degrees of Freedom E Based on sample design Adjusted F Fixed Chi square Adjusted Chi square Adjustment for Multiple Comparisons Least significant di
152. file Click Continue 76 Chapter 12 Figure 12 1 Cox Regression dialog box Time and Event tab Variables amp Region region amp Province province amp District district amp Cty city amp Arrest ID arrest E Age in years age dl Age category agecat amp Marital status marital al Social status social al Level of education ed amp Employed employ amp Gender gender al Severity of first crime crime1 amp Violent first crime violent1 amp Date of release from first arrest date1 Posted bail bail amp Received rehabilitation rehab Mi Severity of second crime crime2 amp Violent second crime violent2 amp Second conviction convict2 amp Date of second arrest date2 Inclusion Selection Probability for St E Cumulative Sampling Weight for Stag E Cumulative Sampling Weight for Stag Tine and Event Predictors Subgroups Model Statistics Pots Hypothesis Tests Save Export _ Options Survival Time Start of Interval Onset of Risk Time 0 O Varies by subject rt Start Variable End of Interval End Variable indicating that event none occurred Subject Identifier Choose a subject identifier variable if there are multiple cases per subject gt Specify the survival time by selecting the entry and exit times from the study gt Select an event status variable gt Click Define Event and define at least one event value
153. for patient 71 at time 2 patient 76 at time 2 and patient 47 at time 3 leaving 217 valid observations bankloan sav This is a hypothetical data file that concerns a bank s efforts to reduce the rate of loan defaults The file contains financial and demographic information on 850 past and prospective customers The first 700 cases are customers who were previously given Copyright SPSS Inc 1989 2010 258 259 Sample Files loans The last 150 cases are prospective customers that the bank needs to classify as good or bad credit risks bankloan_binning sav This is a hypothetical data file containing financial and demographic information on 5 000 past customers behavior sav In a classic example Price and Bouffard 1974 52 students were asked to rate the combinations of 15 situations and 15 behaviors on a 10 point scale ranging from 0 extremely appropriate to 9 extremely inappropriate Averaged over individuals the values are taken as dissimilarities behavior_ini sav This data file contains an initial configuration for a two dimensional solution for behavior sav brakes sav This is a hypothetical data file that concerns quality control at a factory that produces disc brakes for high performance automobiles The data file contains diameter measurements of 16 discs from each of 8 production machines The target diameter for the brakes is 322 millimeters breakfast sav In a classic study Green and Rao 1972 2
154. for specified factors and covariates A separate set of odds ratios is computed for each category of the dependent variable except the reference category Factors For each selected factor displays the ratio of the odds at each category of the factor to the odds at the specified reference category Covariates For each selected covariate displays the ratio of the odds at the covariate s mean value plus the specified units of change to the odds at the mean When computing odds ratios for a factor or covariate the procedure fixes all other factors at their highest levels and all other covariates at their means If a factor or covariate interacts with other predictors in the model then the odds ratios depend not only on the change in the specified variable but also on the values of the variables with which it interacts If a specified covariate interacts with itself in the model for example age age then the odds ratios depend on both the change in the covariate and the value of the covariate 61 Complex Samples Logistic Regression Complex Samples Logistic Regression Save Figure 10 7 Logistic Regression Save dialog box E Complex Samples Logistic Regression Save pSave Variables E Predicted category El Predicted probabilities one per category of the dependent variable rExport Model as PASYY Statistics Data Parameter estimates and covariance matrix Parameter estimates and correlation matrix
155. g 10 14 999 miles year as the reference category Since Driving frequency is not involved in any interaction terms the odds ratios are merely the ratios of the exponentiated parameter estimates For example the cumulative odds ratio for 20 29 999 miles vear vs 10 14 999 miles year is 0 101 0 444 0 227 Generalized Cumulative Model Figure 21 13 Test of parallel lines Adjusted df df2 Wald F Sig Sequential Sidak Sig Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit The test of parallel lines can help you assess whether the assumption that the parameters are the same for all response categories is reasonable This test compares the estimated model with one set of coefficients for all categories to a generalized model with a separate set of coefficients for each category The Wald F test is an omnibus test of the contrast matrix for the parallel lines assumption that provides asymptotically correct p values for small to mid sized samples the adjusted Wald F statistic performs well The significance value is near 0 05 suggesting that the generalized model may give an improvement in the model fit however the Sequential Sidak adjusted test reports a significance value high enough 0 392 that overall there is no clear evidence for rejecting the parallel lines assumption The Sequential Sidak test starts with individual contrast Wald t
156. ge is scaled as ordinal all of the other variables are scaled as single nominal virus sav This is a hypothetical data file that concerns the efforts of an Internet service provider ISP to determine the effects of a virus on its networks They have tracked the approximate percentage of infected e mail traffic on its networks over time from the moment of discovery until the threat was contained wheeze_steubenville sav This is a subset from a longitudinal study of the health effects of air pollution on children Ware Dockery Spiro HI Speizer and Ferris Jr 1984 The data contain repeated binary measures of the wheezing status for children from Steubenville Ohio at ages 7 8 9 and 10 years along with a fixed recording of whether or not the mother was a smoker during the first year of the study workprog sav This is a hypothetical data file that concerns a government works program that tries to place disadvantaged people into better jobs A sample of potential program participants were followed some of whom were randomly selected for enrollment in the program while others were not Each case represents a separate program participant Appendix Notices Licensed Materials Property of SPSS Inc an IBM Company O Copyright SPSS Inc 1989 2010 Patent No 7 023 453 The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law SPSS INC AN IBM
157. gh 46 60 compared to the cumulative odds for gt 60 Thus the odds ratio of 1 383 in the first row of the table means that the cumulative odds for a person aged 18 30 are 1 383 times the cumulative odds for a person older than 60 Note that because Age category is not involved in any interaction terms the odds ratios are 204 Chapter 21 merely the ratios of the exponentiated parameter estimates For example the cumulative odds ratio for 18 30 vs gt 60 is 1 00 0 723 1 383 Figure 21 12 Odds ratios for driving frequency 95 Confidence Cumulative Interval Design Square Root Odds Ratio Lower Upper Effect Design Effect Driving Do not own car vs frequency 10 14 999 miles year 4 288 2 878 6 390 2 345 1 531 10 000 miles year vs 10 14 999 milesiyear 2 030 1 656 2 488 1 838 1 356 15 19 999 miles year vs 10 14 999 milesiyear 484 430 546 1 450 1 204 20 29 999 miles year vs 10 14 999 milesiyear 227 193 267 2 095 1 448 gt 30 000 miles year vs 10 14 999 milesiyear 101 079 129 1 585 1 259 Dependent Wariable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit a Factors and covariates used in the computation are fixed at the following values Age category gt 60 Gender Female Yoted in last election Yes Driving frequency gt 30 000 miles year This table displays the cumulative odds ratios for the factor levels of Driving frequency usin
158. gnificance values less than 0 05 have some discernible effect Thus age employ debtinc and creddebt contribute to the model while the other main effects do not In a further analysis of the data you would probably remove ed address income and othdebt from model consideration Parameter Estimates Figure 20 9 Parameter estimates 95 Confidence 95 Confidence Interval for Previously Interval Design Exp B defaulted Parameter Effect Yes Intercept ed 1 ed 2 ed 3 ed 4 ed 5 age employ address income dehtinc creddebt othdebt Dependent Wariable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt Set to zero because this parameter is redundant The parameter estimates table summarizes the effect of each predictor Note that parameter values affect the likelihood of the did default category relative to the did not default category Thus parameters with positive coefficients increase the likelihood of default while parameters with negative coefficients decrease the likelihood of default The meaning of a logistic regression coefficient is not as straightforward as that of a linear regression coefficient While B is convenient for testing the model effects Exp B is easier to interpret Exp B represents the ratio change in the odds of the event of interest attributable to a one unit increase in the predic
159. gory and Driving frequency Select 10 14 999 miles year a more typical yearly mileage than the maximum as the reference category for Driving frequency Click Continue 200 Chapter 21 Click OK in the Complex Samples Ordinal Regression dialog box Pseudo R Squares Figure 21 6 Pseudo R Squares Cox and Snell 179 Nagelkerke 191 McFadden 071 Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit In the linear regression model the coefficient of determination R2 summarizes the proportion of variance in the dependent variable associated with the predictor independent variables with larger R values indicating that more of the variation is explained by the model to a maximum of 1 For regression models with a categorical dependent variable it is not possible to compute a single R statistic that has all of the characteristics of R in the linear regression model so these approximations are computed instead The following methods are used to estimate the coefficient of determination m Cox and Snell s R Cox and Snell 1989 is based on the log likelihood for the model compared to the log likelihood for a baseline model However with categorical outcomes it has a theoretical maximum value of less than 1 even for a perfect model m Nagelkerke s R Nagelkerke 1991 is an adjusted version of the Cox amp S
160. gt Prepare for Analysis 147 Complex Samples Analysis Preparation Wizard Figure 14 7 Analysis Preparation Wizard Welcome step Welcome to the Analysis Preparation Wizard The Analysis Preparation Wizard helps you describe your complex sample and choose an estimation method You will be asked to provide sample weights and other information needed for accurate estimation of standard errors Your selections will be saved to a plan file that you can use in any of the analysis procedures in the Complex Samples Option What would you like to do Create a plan file Choose this option if you have sample File Browse data but have not created a plan file Pankloan csplan O Edit a plan file Choose this option if you want to add E remove or modify stages of an File Browse existing plan If you already have a plan file you can skip the Analysis Preparation Wizard and go directly to any of the analysis procedures in the Complex Samples Option to analyze your sample oD aj gt Browse to where you want to save the plan file and type bankloan csaplan as the name for the analysis plan file gt Click Next 148 Chapter 14 Figure 14 8 Analysis Preparation Wizard Design Variables step stage 1 Stage 1 Design Variables In this panel you can select variables that define strata or clusters 4 sample weight variable must be selected in the first stage You can also provide a label for the
161. h each customer spent in the previous month the store wants to see 1f the frequency with which customers shop is related to the amount they spend in a month controlling for the gender of the customer and incorporating the sampling design This information is collected in grocery_Imonth_sample sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use the Complex Samples General Linear Model procedure to perform a two factor or two way ANOVA on the amounts spent Running the Analysis gt Torun a Complex Samples General Linear Model analysis from the menus choose Analyze gt Complex Samples gt General Linear Model Copyright SPSS Inc 1989 2010 176 177 Complex Samples General Linear Model Figure 19 1 Complex Samples Plan dialog box Plan File grocery csplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored 9 Use defaut file based on name of plan file O An open dataset grocery_1month_sample sav DataSet O Custom file File gt Browse to and select grocery csplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples
162. h equal probability and without replacement WOR directly from the entire population By contrast a given complex sample can have some or all of the following features Stratification Stratified sampling involves selecting samples independently within non overlapping subgroups of the population or strata For example strata may be socioeconomic groups job categories age groups or ethnic groups With stratification you can ensure adequate sample sizes for subgroups of interest improve the precision of overall estimates and use different sampling methods from stratum to stratum Clustering Cluster sampling involves the selection of groups of sampling units or clusters For example clusters may be schools hospitals or geographical areas and sampling units may be students patients or citizens Clustering is common in multistage designs and area geographic samples Multiple stages In multistage sampling you select a first stage sample based on clusters Then you create a second stage sample by drawing subsamples from the selected clusters If the second stage sample is based on subclusters you can then add a third stage to the sample For example in the first stage of a survey a sample of cities could be drawn Then from the selected cities households could be sampled Finally from the selected households individuals could be polled The Sampling and Analysis Preparation wizards allow you to specify three stages in a design
163. h must be positive Display iteration history Displays the iteration history for the parameter estimates and pseudo log likelihood and prints the last evaluation of the change in parameter estimates and pseudo log likelihood The iteration history table prints every n iterations beginning with the Oth 91 Complex Samples Cox Regression iteration the initial estimates where n is the value of the increment If the iteration history is requested then the last iteration is always displayed regardless of n Tie breaking method for parameter estimation When there are tied observed failure times one of these methods is used to break the ties The Efron method is more computationally expensive Survival Functions These controls specify criteria for computations involving the survival function Method for estimating baseline survival functions The Breslow or Nelson Aalan or empirical method estimates the baseline cumulative hazard by a nondecreasing step function with steps at the observed failure times then computes the baseline survival by the relation survival exp cumulative hazard The Efron method is more computationally expensive and reduces to the Breslow method when there are no ties The product limit method estimates the baseline survival by a non increasing right continuous function when there are no predictors in the model this method reduces to Kaplan Meier estimation Confidence intervals of survival functions The co
164. h patterns with the number Haz_0 LCL_Haz_0 UCL_Haz_0 Baseline cumulative hazard function and the upper and lower bounds of its confidence interval Haz_R LCL_Haz_R UCL_Haz_R Cumulative hazard function evaluated at the reference pattern see the pattern values table in the output and the upper and lower bounds of its confidence interval Haz_ LCL_Haz_ UCL_Haz_ Cumulative hazard function evaluated at each of the predictor patterns specified on the Plots tab and the upper and lower bounds of their confidence intervals See the pattern values table in the output to match patterns with the number Export model as XML Saves all information needed to predict the survival function including parameter estimates and the baseline survival function in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes 90 Chapter 12 Options Figure 12 12 Cox Regression dialog box Options tab En Complex Samples Cox Regression E Time andEvert Predictors Subgroups Model Statistics lets Hypothesis Tests Save Export Options Estimation pSurvival Functions Method for estimating baseline survival functions Efron method Maximum Step Halving pregos meod O Product limit method Maximum Iterations 100 Fi Limit iterations based on change in parameter estimates Confidence intervals of survival functions C
165. he level of support for the bill based upon voter demographics Running the Analysis gt To run a Complex Samples Ordinal Regression analysis from the menus choose Analyze gt Complex Samples gt Ordinal Regression Copyright SPSS Inc 1989 2010 195 196 Chapter 21 Figure 21 1 Complex Samples Plan dialog box Plan File poll csplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default file C Program FilestSPSSInC PAS poll saw Oan open dataset poll_cs_sample sav DataSet2 O Custom file File poll_jointprob sav Browse gt Browse to and select poll csplan as the plan file For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Select poll _jointprob sav as the joint probabilities file gt Click Continue 197 Complex Samples Ordinal Regression Figure 21 2 Ordinal Regression dialog box Variables Dependent Variable Voter ID voteia J legist Neighborhood nbrhood amp Township town amp County county Inclusion Selection Pr E Cumulative Sampling YY E Cumulative Sampling W Covariates
166. her condition is met in this example since the overall proportion of respondents was 12 8 and the design of the study was not case control so it s safer to report 1 673 as the relative risk rather than the value of the odds ratio 170 Chapter 17 Risk Estimate by Subpopulation Figure 17 6 Risk estimate for newspaper subscription by response controlling for income category Income category Estimate Under 25 Newspaper Odds Ratio 2 712 subscription Relative For cohort Response Yes 2 241 Response Risk For cohort Response No P 826 25 49 Newspaper Odds Ratio subscription Relative For cohort Response Yes Response Risk For cohort Response No 50 74 Newspaper Odds Ratio subscription Relative For cohort Response Yes Response Risk For cohort Response No Newspaper Odds Ratio subscription Relative For cohort Response Yes Response Risk For cohort Response No Statistics are computed only for 2 by 2 tables with all cells observed Relative risk estimates are computed separately for each income category Note that the relative risk of a positive response for newspaper subscribers appears to gradually decrease with increasing income which indicates that you may be able to further target the mailings Summary Using Complex Samples Crosstabs risk estimates you found that you can increase your response rate to direct mailings by targeting newspaper subscribers Further you found som
167. his by specifying the time at which they exited rehabilitation as the time of entry into the study Date amp Time Variables Date amp Time variables cannot be used to directly define the start and end of the interval if you have Date amp Time variables you should use them to create variables containing survival times If there is no left truncation simply create a variable containing end times based upon the difference between the date of entry into the study and the observation date If there is left truncation create a variable containing start times based upon the difference between the date of the start of the study and the date of entry and a variable containing end times based upon the difference between the date of the start of the study and the date of observation Event Status You need a variable that records whether the subject experienced the event of interest within the interval Subjects for whom the event has not occurred are right censored Copyright SPSS Inc 1989 2010 74 75 gt gt gt Complex Samples Cox Regression Subject Identifier You can easily incorporate piecewise constant time dependent predictors by splitting the observations for a single subject across multiple cases For example if you are analyzing survival times for patients post stroke variables representing their medical history should be useful as predictors Over time they may experience major medical events that alter their
168. his is a modified version of car_sales sav that does not include any transformed versions of the fields carpet sav In a popular example Green and Wind 1973 a company interested in marketing a new carpet cleaner wants to examine the influence of five factors on consumer preference package design brand name price a Good Housekeeping seal and a money back guarantee There are three factor levels for package design each one differing in the location of the applicator brush three brand names K2R Glory and Bissell three price levels and two levels either no or yes for each of the last two factors Ten consumers rank 22 profiles defined by these factors The variable Preference contains the rank of the average rankings for each profile Low rankings correspond to high preference This variable reflects an overall measure of preference for each profile 260 Appendix A carpet_prefs sav This data file is based on the same example as described for carpet sav but it contains the actual rankings collected from each of the 10 consumers The consumers were asked to rank the 22 product profiles from the most to the least preferred The variables PREF through PREF 22 contain the identifiers of the associated profiles as defined in carpet_plan sav catalog sav This data file contains hypothetical monthly sales figures for three products sold by a catalog company Data for five possible predictor variables are also included catalog_
169. households Demographic information and observations about health behaviors and status are obtained for members of each household This data file contains a subset of information from the 2000 survey National Center for Health Statistics National Health Interview Survey 2000 Public use data file and documentation Stp ftp cdc gov pub Health_Statistics NCHS Datasets NHIS 2000 Accessed 2003 ozone sav The data include 330 observations on six meteorological variables for predicting ozone concentration from the remaining variables Previous researchers Breiman and Friedman 1985 Hastie and Tibshirani 1990 among others found nonlinearities among these variables which hinder standard regression approaches pain_medication sav This hypothetical data file contains the results of a clinical trial for anti inflammatory medication for treating chronic arthritic pain Of particular interest is the time it takes for the drug to take effect and how it compares to an existing medication patient_los sav This hypothetical data file contains the treatment records of patients who were admitted to the hospital for suspected myocardial infarction MI or heart attack Each case corresponds to a separate patient and records many variables related to their hospital stay patlos_sample sav This hypothetical data file contains the treatment records of a sample of patients who received thrombolytics during treatment for myocardial infarction MI or
170. i Summary J Draw Sample i Selection Options Output Files O Read values from variable Select Proportions from the Units drop down list gt Type 0 2 as the value for the proportion of units to select in this stage gt Click Next and then click Next in the Output Variables step 115 Complex Samples Sampling Wizard Figure 13 23 Sampling Wizard Plan Summary step stage 3 Stage 3 Plan Summary This panel summarizes the sampling plan so far The next step is to set options for drawing your sample gt Welcome Stage 1 i Design Variables Method Summary Sample Size None district city DA per stratum Simple Random Sampling Output Variables WOR i Summary Stage 2 i Design Variables Method Sample Size Output Variables i Summary _ Stage 3 i Design Variables Method Sample Size Output Variables gt Summary _ Draw Sample i Selection Options Output Files Completion None subdivision 0 2 per stratum Simple Random Sampling WOR File c ttempidemo csplan gt Look over the sampling design and then click Next 116 Chapter 13 Figure 13 24 Sampling Wizard Draw Sample Selection Options step ls Sampling Wizard E Draw Sample Selection Options In this panel you can choose whether to draw a sample You can pick which stages to extract and set other sampling options such as the seed used for random number generation Do you want to draw a sample Design Var
171. iable First event postattack 4 Subject ID Variable Patient ID Model mi is hs The significance value for each effect is near 0 suggesting that they all contribute to the model Parameter Estimates Figure 22 51 Parameter estimates 95 Confidence 95 Confidence Interval Interval for Exp B Parameter 236 ot 7 PRE a NES 6421 202 B817 6024 002 002 ge 094 FS Se Penes eee wee oe eee 004 052 223 oom fT tooo ff Survival Time Variable Length of stay for rehabilitation Time to first event post attack Event Status Variable First event post attack 4 Subject ID Variable Patient ID Model mi is hs a Set to zero because this parameter is redundant b Tie breaking method Breslow The procedure uses the last category of each factor as the reference category the effect of other categories is relative to the reference category Note that while the estimate is useful for statistical testing the exponentiated estimate Exp B is more easily interpreted as the predicted change in the hazard relative to the reference category m The value of Exp B for mi 0 means that the hazard of death for a patient with no prior myocardial infarctions mi is 0 002 times that of a patient with three prior mi s 256 Chapter 22 m The confidence intervals for mi 0 and mi 1 do not overlap with the interval for mi 2 and none of them include 0 Therefore it appears that the hazar
172. iables Baseline Strata A Doo e Separate survival and hazard functions are computed for each stratum Region region amp Province province amp District district de City city Arrest ID arrest Subpopulation Variable dl Age category agecat 2 amp Marital status marital dl Social status social Category dl Level of education ed amp Employed employ de Gender gender Choose a subpopulation variable to limit your analysis to a particular dl Severity of first crime crime1 group subpopulation Violent first crime violent1 amp Date of release from first arrest date1 amp Posted bail bail Received rehabilitation rehab all Severity of second crime crime2 amp Violent second crime violent2 amp Second conviction convict2 Date of second arrest date2 Inclusion Selection Probability for Stage 1 Inclusi 8 Cumulative Sampling Weight for Stage 1 SamplewW E Cumulative Sampling Weight for Stage 2 SampleWW Ca an ca Baseline Strata A separate baseline hazard and survival function is computed for each value of this variable while a single set of model coefficients is estimated across strata Subpopulation Variable Specify a variable to define a subpopulation The analysis is performed only for the selected category of the subpopulation variable 81 Model Complex Samples Cox Regression Figure 12 6 Cox Regression dialog box M
173. iables Method O Yes Stages ey Sample Size O no le Output Variables i Summary J _ Stage 2 in Design Variables Oa randomly chosen number What type of seed value do you want to use Method Custom value 241972 Enter a custom seed value if you want to reproduce Sample Size the sample later Output Variables i Summary Include in the sample frame cases with user missing values of stratification or e Stage 3 clustering variables Design Variables Working data are sorted by stratification variables presorted data may speed Method processing Sample Size Output Variables i Summary J Draw Sample i gt Selection Options Output Files Select 1 2 as the stages to sample now gt Select Custom value for the type of random seed to use and type 241972 as the value Using a custom value allows you to replicate the results of this example exactly gt Click Next and then click Next in the Draw Sample Output Files step 117 Complex Samples Sampling Wizard Figure 13 25 Sampling Wizard Finish step Completing the Sampling Wizard You have provided all of the information needed to create a sample design and draw a sample You can return to the Sampling Wizard later if you need to add or modify stages After all the stages have been sampled you can use the plan file in any Complex Samples analysis procedure to indicate how the sample was drawn A Stage 1 i i Design Variables Metho
174. iables to Cases Create One Index Variable You have chosen to create one index variable The variable s values can be sequential numbers or the names of variables in a group Inthe table you can specify the name and label for the index variable What kind of index values 9 Sequential numbers Index Values 1 2 3 O Variable names eventi event2 event3 Edit the Index Variable Name and Label Name Label Index Yalues 1 J event_index Event Index tee Fe Type event_index as the name of the index variable and type Event index as the variable label gt Click Next 238 Chapter 22 Figure 22 32 Restructure Data Wizard Variables to Cases Create One Index Variable step Variables to Cases Options Inthis step you can set options that will be applied to the restructured data file Handling of Variables not Selected Drop variable s from the new data file 9 Keep and treat as fixed variable s System Missing or Blank Values in all Transposed Variables Create a case in the new file O Discard the data Case Count Variable El Count the number of new cases created by the case in the current data Name Label gt Make sure Keep and treat as fixed variable s is selected gt Click Finish 239 Complex Samples Cox Regression Figure 22 33 Restructured data event_index event start_time a mi is hs 1 0 3 1500 0 1 0 2 4 1500
175. iation look at the test results Figure 19 10 Individual test results for estimated marginal means of gender Difference Hypothesized Estimate Value Hypothesized 000 150 907 Contrast Estimate 150 907 Aho shopping for Simple Contrast Level Self vs Level Self and family Level Self and spouse vs Level Self and family WaldF Sig 000 947 41 89 103 000 89 103 227 84 000 a Reference Category Self and family The individual tests table displays two simple contrasts in spending m The contrast estimate is the difference in spending for the listed levels of Who shopping for m The hypothesized value of 0 00 represents the belief that there is no difference in spending m The Wald F statistic with the displayed degrees of freedom is used to test whether the difference between a contrast estimate and hypothesized value is due to chance variation m Since the significance values are less than 0 05 you can conclude that there are differences in spending The values of the contrast estimates are different from the parameter estimates This is because there is an interaction term containing the Who shopping for effect As a result the parameter estimate for shopfor 1 is a simple contrast between the levels Self and Self and Family at the level From both of the variable Use coupons The contrast estimate in this table is averaged over the levels of Use coupons
176. ice listed on the Web site at http www spss com worldwide Copyright SPSS Inc 1989 2010 iii Additional Publications The SPSS Statistics Guide to Data Analysis SPSS Statistics Statistical Procedures Companion and SPSS Statistics Advanced Statistical Procedures Companion written by Marija Norusis and published by Prentice Hall are available as suggested supplemental material These publications cover statistical procedures in the SPSS Statistics Base module Advanced Statistics module and Regression module Whether you are just getting starting in data analysis or are ready for advanced applications these books will help you make best use of the capabilities found within the IBM SPSSO Statistics offering For additional information including publication contents and sample chapters please see the author s website ht1p www norusis com Part I Users Guide 1 Introduction to Complex Samples Procedures Contents Properties of Complex Samples oooooccoco ete cette eee eee 1 Usage of Complex Samples Procedures 000 0c eee eee eee eee e eee 2 Plan FICS vice ice Renta aa id onl EE id 2 Further Readings i028 io ri wed a aa aa a a a ra a a pea E a a 3 2 Sampling from a Complex Design 4 Creating a New Sample Plan 1 0 tte ees 4 Sampling Wizard Design Variables 0 000 c cece eee eee 6 Tree Controls for Navigating the Sampling Wizard 2 000 000 e cee oo 7 Sampling Wizard
177. icients for covariates and relative values of the coefficients for factor levels can give important insights into the effects of the predictors in the model m For covariates positive negative coefficients indicate positive inverse relationships between predictors and outcome An increasing value of a covariate with a positive coefficient corresponds to an increasing probability of being in one of the higher cumulative outcome categories m For factors a factor level with a greater coefficient indicates a greater probability of being in one of the higher cumulative outcome categories The sign of a coefficient for a factor level is dependent upon that factor level s effect relative to the reference category Figure 21 8 Parameter estimates 95 Confidence Parameter Threshold opinion_gastax 1 opinion_gastax 2 opinion_gastax 3 agecat 1 agecat 2 agecat 3 agecat 4 gender 0 gender 1 votelast 0 votelast 1 drivetreq 1 drivefreq 2 drivefreq 3 drivefreq 4 drivefreq 5 drivefreq 6 Regression 3 343 1 910 674 324 138 095 0009 008 000 011 0007 3 751 3 003 2 295 1 570 812 0009 Std Error 95 Confidence Interval Effect Interval for Exp B Exp B Design Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit a Set to zero because thi
178. ics Complex Samples Missing Values Complex Samples Options Complex Samples Descriptives Complex Samples Descriptives Statistics Complex Samples Descriptives Missing Values Complex Samples Options Complex Samples Crosstabs Complex Samples Crosstabs Statistics Complex Samples Missing Values Complex Samples Options Complex Samples Ratios Complex Samples Ratios Statistics Complex Samples Ratios Missing Values Complex Samples Options vi 9 Complex Samples General Linear Model Complex Samples General Linear Model Statistics Complex Samples Hypothesis Tests ooocococcoococoocooo ooo Complex Samples General Linear Model Estimated Means Complex Samples General Linear Model Save 02000 5 Complex Samples General Linear Model Options CSGLM Command Additional Features ooococcocccocoo o 10 Complex Samples Logistic Regression Complex Samples Logistic Regression Reference Category Complex Samples Logistic Regression Model 2 0 00005 Complex Samples Logistic Regression StatisticS o Complex Samples Hypothesis Tests 000000 cece eee eeeeee Complex Samples Logistic Regression Odds Rati0S Complex Samples Logistic Regression Save 22000e eee eaee Complex Samples Logistic Regression Options 2 0 0
179. ificance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level m Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level Sidak This method provides tighter bounds than the Bonferroni approach Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested Complex Samples Ordinal Regression Odds Ratios Figure 11 6 Ordinal Regression Odds Ratios dialog box E Complex Samples Ordinal Regression Odds Ratios Factors Cumulative Odds Ratios for Comparing Factor Levels Factor Reference Category Age category agecat Highest value vr Covariates Cumulative Odds Ratios for Change in Covariate Values e A One set of cumulative odds ratios is produced for each variable in the Odds Ratios grids For each set all other factors in the model ar
180. ime units between two dates e g calculate an age in years from a birthdate and another date Subtract two durations e g time worked time commuting cs 6 gt Select Calculate the number of time units between two dates gt Click Next 215 Complex Samples Cox Regression Figure 22 5 Date and Time Wizard Calculate the number of time units between two dates step Calculate the number of time units between two date or datetime variables The result will be an integer variable Any fractional part of a unit will be discarded The result will be a duration variable Only duration variables are shown in the variables list below Variables Date1 amp Current date and time ry Date of second arrest date2 minus Date2 Date of release from first arrest date1 Unit Result Treatment Truncate to integer O Round to integer O Retain fractional part For month and year units the result is based on average unit length unless truncation is used STIME is the current date and time Select Date of second arrest date2 as the first date Select Date of release from first arrest datel as the date to subtract from the first date Select Days as the unit Click Next 216 Chapter 22 Figure 22 6 Date and Time Wizard Calculation step Calculation date2 date1 Result Variable Units Variable Label Time to second arrest Feet Create the variable now Pa
181. in terms of rejecting individual hypotheses but maintains the same overall significance level Sidak This method provides tighter bounds than the Bonferroni approach m Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested Complex Samples General Linear Model Estimated Means Figure 9 5 General Linear Model Estimated Means dialog box E Complex Samples General Linear Model Estimated Means a y Factors and Interactions shopfor usecoup shopfor usecoup Display Means For Term Contrast Reference Category shopfor Simple 3 shopfor usecoup None 2 From newspaper 3 From mailings 4 From both E Display mean for overall population ana coa Cs 51 Complex Samples General Linear Model The Estimated Means dialog box allows you to display the model estimated marginal means for levels of factors and factor interactions specified in the Model subdialog box You can also request that the overall population mean be displayed Term Estimated means are computed for the selected factors and factor interactions Contrast The contrast determines how hypothesis tests are set up to compare the estimated means Simple Compares the mean of each level to the mean of a specified level This type of contrast 1s useful when there is a control group Deviation Compares the mean of each level except a reference category t
182. ing Copyright SPSS Inc 1989 2010 28 Chapter Complex Samples Frequencies The Complex Samples Frequencies procedure produces frequency tables for selected variables and displays univariate statistics Optionally you can request statistics by subgroups defined by one or more categorical variables Example Using the Complex Samples Frequencies procedure you can obtain univariate tabular statistics for vitamin usage among U S citizens based on the results of the National Health Interview Survey NHIS and with an appropriate analysis plan for this public use data Statistics The procedure produces estimates of cell population sizes and table percentages plus standard errors confidence intervals coefficients of variation design effects square roots of design effects cumulative values and unweighted counts for each estimate Additionally chi square and likelihood ratio statistics are computed for the test of equal cell proportions Data Variables for which frequency tables are produced should be categorical Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Obtaining Complex Samples Frequencies gt From the menus choose Analyze gt Complex Samples gt Frequencies Select a plan file Op
183. ion Probability Obtained Obtained from from variable variable Inclusion Inclusion Probability_1_ Probability_ 2 Plan File c property_assess csplan Weight Variable SampleWWeight_Final_ The summary table reviews your sampling plan and is useful for making sure that the plan represents your intentions Sampling Summary Figure 13 11 Stage summary Proportion of Units Number of Units Sampled Sampled County Requested Requested Actual Eastern 4 Central 4 Western 4 Northern 4 Southern 4 Plan File c property_assess csplan This summary table reviews the first stage of sampling and is useful for checking that the sampling went according to plan Four townships were sampled from each county as requested 104 Chapter 13 Figure 13 12 Stage summary Proportion of Units Number of Units Sampled Sampled mitu i mti Requested County Eastern Panta 20 0 20 0 20 0 20 3 20 6 20 6 This summary table the top part of which is shown here reviews the second stage of sampling It is also useful for checking that the sampling went according to plan Approximately 20 of the properties were sampled from each neighborhood from each township sampled in the first stage as requested Sample Results Figure 13 13 Data Editor with sample results propid nbrhood town county time lastval ve_1 ve 2 577 0 8 2 1 4 181 70
184. ion after the current stage will be ignored when the data are analyzed Use finite population correction FPC when estimating variance under simple random sampling assumption Equal WOR equal probability sampling without replacement The next panel will ask you to specify inclusion probabilities or population sizes Unequal WOR unequal probability sampling without replacement Joint probabilities will be required to analyze sample data This option is available in stage 1 only lt Beck next gt Fins cance re gt Deselect Use finite population correction gt Click Finish You are now ready to run the analysis Running the Analysis Torun a Complex Samples Cox Regression analysis from the menus choose Analyze gt Complex Samples gt Cox Regression 247 Complex Samples Cox Regression Figure 22 41 Plan for Cox Regression dialog box Plan File srs csaplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default tile srs sav O An open dataset stroke_survival sav DataSet4 Custom file File Browse Browse to where you sav
185. ions Output Files Completion aa cal incomplete section gt Select District as a stratification variable gt Select City as a cluster variable Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each district In this stage cities are drawn as the primary sampling unit using the default method simple random sampling 111 Figure 13 19 Sampling Wizard Sample Size step stage 2 Stage 2 Sample Size Welcome J Stage 1 in Design Variables Method Sample Size Output Variables i Summary J f Stage 2 Design Variables Method gt Sample Size Output Variables Summary Add Stage 3 J _ Draw Sample i Selection Options Output Files Completion Select Proportions from the Units drop down list vision subdivision Complex Samples Sampling Wizard In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw Value The size value applies to each stratum O Unequal values for strata Define Read values from variable Minimum Maximum Count O Count O Type 0 1 as the value of the proportion of units to sample from each strata gt Click Next and then
186. is Preparation Wizard is used to specify analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples General Linear Model procedure allows you to model a scale response m The Complex Samples Ordinal Regression procedure allows you to model an ordinal response Chapter Complex Samples Ordinal Regression The Complex Samples Ordinal Regression procedure creates a predictive model for an ordinal dependent variable for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Using Complex Samples Ordinal Regression to Analyze Survey Results Representatives considering a bill before the legislature are interested in whether there is public support for the bill and how support for the bill is related to voter demographics Pollsters design and conduct interviews according to a complex sampling design The survey results are collected in pol _cs_sample sav The sampling plan used by the pollsters is contained in poll csplan because it makes use of a probability proportional to size PPS method there is also a file containing the joint selection probabilities poll_jointprob sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use Complex Samples Ordinal Regression to fit a model for t
187. it change in the covariate Years with current employer The reported value is the ratio of the odds of default for a person with 7 99 years at their current job compared to the odds of default for a person with 6 99 years the mean Figure 20 12 Odds ratios for debt to income ratio 95 Confidence Previously Interval Odds Ratio Units of Change defaulted Debt to income ratio x100 1 000 Yes 1 100 1 058 1 143 Dependent Variable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt 4 Factors and covariates used in the computation are fixed at the following values Level of education Post undergraduate degree Age in years 34 19 Years with current employer 6 99 Years at current address 6 32 Household income in thousands 60 1581 Debt to income ratio x100 9 9341 Credit card debt in thousands 1 9764 Other debt in thousands 3 9164 194 Chapter 20 This table displays the odds ratio of Previously defaulted for a unit change in the covariate Debt to income ratio The reported value is the ratio of the odds of default for a person with a debt income ratio of 10 9341 compared to the odds of default for a person with 9 9341 the mean Note that because none of these predictors are part of interaction terms the values of the odds ratios reported in these tables are equal to the values of the exponentiated parameter estimates When a predictor is part of a
188. ity hivassay sav This is a hypothetical data file that concerns the efforts of a pharmaceutical lab to develop a rapid assay for detecting HIV infection The results of the assay are eight deepening shades of red with deeper shades indicating greater likelihood of infection A laboratory trial was conducted on 2 000 blood samples half of which were infected with HIV and half of which were clean hourlywagedata sav This is a hypothetical data file that concerns the hourly wages of nurses from office and hospital positions and with varying levels of experience insurance_claims sav This is a hypothetical data file that concerns an insurance company that wants to build a model for flagging suspicious potentially fraudulent claims Each case represents a separate claim insure sav This is a hypothetical data file that concerns an insurance company that is studying the risk factors that indicate whether a client will have to make a claim on a 10 year term life insurance contract Each case in the data file represents a pair of contracts one of which recorded a claim and the other didn t matched on age and gender judges sav This is a hypothetical data file that concerns the scores given by trained judges plus one enthusiast to 300 gymnastics performances Each row represents a separate performance the judges viewed the same performances kinship_dat sav Rosenberg and Kim Rosenberg and Kim 1975 set out to analyze 15 kinship terms
189. l 48 in Complex Samples Logistic Regression 57 in Complex Samples Ordinal Regression 68 table percentages in Complex Samples Crosstabs 39 in Complex Samples Frequencies 30 158 test of parallel lines in Complex Samples Ordinal Regression 68 204 test of proportional hazards in Complex Samples Cox Regression 82 tests of model effects in Complex Samples Cox Regression 255 in Complex Samples General Linear Model 181 in Complex Samples Logistic Regression 191 in Complex Samples Ordinal Regression 200 time dependent predictor in Complex Samples Cox Regression 79 210 trademarks 268 unweighted count in Complex Samples Crosstabs 39 in Complex Samples Descriptives 34 in Complex Samples Frequencies 30 in Complex Samples Ratios 43 warnings in Complex Samples Ordinal Regression 207
190. l logarithm raised to the power of the estimates of the coefficients While the estimate has nice properties for statistical testing the exponentiated estimate or exp B is easier to interpret m Standard error Displays the standard error for each coefficient estimate Confidence interval Displays a confidence interval for each coefficient estimate The confidence level for the interval is set in the Options dialog box m Ttest Displays a test of each coefficient estimate The null hypothesis for each test is that the value of the coefficient is 0 m Covariances of parameter estimates Displays an estimate of the covariance matrix for the model coefficients Correlations of parameter estimates Displays an estimate of the correlation matrix for the model coefficients m Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects m Square root of design effect This is a measure of the effect of specifying a complex design where values further from indicate greater effects Summary statistics for model variables Displays summary information about the dependent variable covariates and factors Sample design information Displays summary information about the sample including the unweighted count and the population size 59
191. lan dialog box Obtaining Complex Samples Crosstabs gt From the menus choose Analyze gt Complex Samples gt Crosstabs gt Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 37 38 Chapter 7 Figure 7 1 Crosstabs dialog box Variables Region region Province province amp District district amp City city amp Subdivision subdivi Unit unit E Age in years age E Marital status marital 8 Years at current ad 7 Column L Price of primary ve Responsa response al Primary vehicle pric Level of education E Subpopulations 8 Years with current Retired retire Le all Years with current 8E Job satisfaction job 8 Gender gender Each combination of categories a defines a subpopulation gt Select at least one row variable and one column variable Optionally you can specify variables to define subpopulations Statistics are computed separately for each subpopulation 39 Complex Samples Crosstabs Complex Samples Crosstabs Statistics Figure 7 2 Crosstabs Statistics dialog box E Complex Samples Crosstabs Statistics rCells Population size 7 Column percent E Row percent C Table percent pStatistics M Standard error Unweighted count E Confidence interval T Design effect T Square root of design effect
192. le Complex Samples Logistic Regression Reference Category Figure 10 2 Logistic Regression Reference Category dialog box omplex Samp stic Regression Reference Category Highest value Lowest value Custom Value 56 Chapter 10 By default the Complex Samples Logistic Regression procedure makes the highest valued category the reference category This dialog box allows you to specify the highest value the lowest value or a custom category as the reference category Complex Samples Logistic Regression Model Figure 10 3 Logistic Regression Model dialog box Ed Complex Samples Logistic Regression Model E rSpecify Model Effects Main effects 9 Custom Factors amp Covariates il ed ES age employ employ Ed address i address Z income income Z detine Main effects debtino Z creddebt z creddebt 14 othdebt othdebt y Nested Term Term Add to Model Clear pintercept M Include in model Y Display statistics Specify Model Effects By default the procedure builds a main effects model using the factors and covariates specified in the main dialog box Alternatively you can build a custom model that includes interaction effects and nested terms Non Nested Terms For the selected factors and covariates Interaction Creates the highest level interaction term for all selected variables Main effects Creates a main effects ter
193. le Summary po Add Stage 3 Minimum Maximum J _ Draw Sample Count C Count H Selection Options Output Files Completion Select Proportions from the Units drop down list gt Type 0 2 as the value of the proportion of units to sample from each stratum gt Click Next and then click Next in the Output Variables step 100 Chapter 13 Figure 13 7 Sampling Wizard Plan Summary step stage 2 Stage 2 Plan Summary This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample h Welcome Summary E Staged Stage Label Strata Clusters Size Method Design Variables 1 None county town 4 per stratum Simple Random Sampling Method Sample Size Output Variables h Summary J Stage 2 Design Variables Method Sample Size Output Variables gt Summary Add Stage 3 Draw Sample Selection Options Output Files Completion None nbrhood File Ctempiproperty_assess csplan Do you want to add stage 3 O Yes add stage 3 now Choose this option if the working data file contains data for stage 3 gt Look over the sampling design and then click Next 0 2 per stratum WOR Simple Random Sampling WOR No do not add another stage now Choose this option if stage 3 data are not available yet or your design has only two stages 101 Complex Sampl
194. les step 127 Complex Samples Sampling Wizard Figure 13 35 Sampling Wizard Sample Size step stage 1 E Sampling Wizard Xi Stage 1 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw Welcome Variables oe de Voter ID voteid Units po Design Variables amp Neighborhood nbrhood Method O Value gt Sample Size The size value applies Output Variables to each stratum Summary Add Stage 2 _ Draw Sample Selection Options Output Files O Read values from variable Minimum Maximum Count Count b O Unequal values for strata Define lt Beck es gt finen conce re Select Proportions from the Units drop down list Type 0 3 as the value for the proportion of townships to select per county in this stage Legislators from the Western county point out that there are fewer townships in their county than in others In order to ensure adequate representation they would like to establish a minimum of 3 townships sampled from each county gt Type 3 as the minimum number of townships to select and 5 as the maximum gt Click Next and then click Next in the Output Variables step 128 Chapter 13 Figure 13 36 Sampling Wizard
195. linear regression analysis as well as analysis of variance and covariance for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Example A grocery store chain surveyed a set of customers concerning their purchasing habits according to a complex design Given the survey results and how much each customer spent in the previous month the store wants to see if the frequency with which customers shop is related to the amount they spend in a month controlling for the gender of the customer and incorporating the sampling design Statistics The procedure produces estimates standard errors confidence intervals tests design effects and square roots of design effects for model parameters as well as the correlations and covariances between parameter estimates Measures of model fit and descriptive statistics for the dependent and independent variables are also available Additionally you can request estimated marginal means for levels of model factors and factor interactions Data The dependent variable is quantitative Factors are categorical Covariates are quantitative variables that are related to the dependent variable Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box
196. lities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default file nhis2000_subset sav An open dataset nhis2000_subset sav SDataSet O Custom file File Browse to and select nhis2000_subset csaplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Click Continue 157 Figure 15 2 Frequencies dialog box Variables PSU for variance e E Weight Final Annu Sex SEX Age AGE _P Region REGION E Smoking frequency E Take any multi vita Take herbal supple Subpopulations E Freq vigorous activi Age category age_cat E Freq moderate activ E Freq strength activit pesrabe Body els Each combination of Daily activities movi categories defines a E Daily activities liftin subpopulation Select Vitamin mineral supplmnts past 12 m as a frequency variable Select Age category as a subpopulation variable gt Click Statistics Figure 15 3 Frequencies Statistics dialog box Cells l Population size M Table percent Statistics M Standard error 7 Unweighted count Fi Confidence interval _ Design effect Level Square root of design effect Coefficient of variation Cumulative values _ Test of equal cell proportions Gert canca J _
197. luded in the model If you can assume the data pass through the origin you can exclude the intercept Even if you include the intercept in the model you can choose to suppress statistics related to it Complex Samples Logistic Regression Statistics Figure 10 4 Logistic Regression Statistics dialog box El Complex Samples Logistic Regression Statistics Model Fit Y Pseudo R square Y Classification table Parameters Estimate E Covariances of parameter estimates T Exponentiated estimate E Correlations of parameter estimates Y Standard error IM Design effect M Confidence interval E Square root of design effect E t test M Summary statistics for model variables 2 Sample design information Model Fit Controls the display of statistics that measure the overall model performance 58 Chapter 10 Pseudo R square The R statistic from linear regression does not have an exact counterpart among logistic regression models There are instead multiple measures that attempt to mimic the properties of the R2 statistic Classification table Displays the tabulated cross classifications of the observed category by the model predicted category on the dependent variable Parameters This group allows you to control the display of statistics related to the model parameters m Estimate Displays estimates of the coefficients m Exponentiated estimate Displays the base of the natura
198. lusters 1 Region Province 2 District City 3 Subdivision In the third stage households are the primary sampling unit and selected households will be surveyed However since information is easily available only to the city level the company plans to execute the first two stages of the design now and then collect information on the numbers of subdivisions and households from the sampled cities The available information to the city level is collected in demo_cs_1 sav For more information see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19 Note that this file contains a variable Subdivision that contains all 1 s This is a placeholder for the true variable whose values will be collected after the first two stages of the design are executed that allows you to specify the full three stage sampling design now Use the Complex Samples Sampling Wizard to specify the full complex sampling design and then draw the first two stages Using the Wizard to Sample from the First Partial Frame gt Torun the Complex Samples Sampling Wizard from the menus choose Analyze gt Complex Samples gt Select a Sample 106 Chapter 13 Figure 13 14 Sampling Wizard Welcome step Welcome to the Sampling Wizard The Sampling Wizard helps you design and select a complex sample Your selections will be saved to a plan file that you can use at analysis time to indicate how the data were sampled You can also use the wizard to
199. lysis plan from the menus choose Analyze gt Complex Samples gt Prepare for Analysis 244 Chapter 22 Figure 22 38 Analysis Preparation Wizard Welcome step Welcome to the Analysis Preparation Wizard The Analysis Preparation Wizard helps you describe your complex sample and choose an estimation method You will be asked to provide sample weights and other information needed for accurate estimation of standard errors Your selections will be saved to a plan file that you can use in any of the analysis procedures in the Complex Samples Option What would you like to do Create a plan file Choose this option if you have sample File Browse data but have not created a plan file srs csaplan O Edit a plan file Choose this option if you want to add remove or modify stages of an File existing plan If you already have a plan file you can skip the Analysis Preparation Wizard and go directly to any of the analysis procedures in the Complex Samples Option to analyze your sample cs a gt Select Create a plan file and type srs csaplan as the name of the file Alternatively browse to the location you want to save it gt Click Next 245 Complex Samples Cox Regression Figure 22 39 Analysis Preparation Wizard Design Variables Stage 1 Design Variables In this panel you can select variables that define strata or clusters 4 sample weight variable must be selected in the first
200. m for each variable selected All 2 way Creates all possible two way interactions of the selected variables 57 Complex Samples Logistic Regression All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying A A is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A 4 is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Intercept The intercept is usually inc
201. mates of the cell population sizes and table percentages Statistics This group produces statistics associated with the population size or table percentage m Standard error The standard error of the estimate 31 Complex Samples Frequencies Confidence interval A confidence interval for the estimate using the specified level Coefficient of variation The ratio of the standard error of the estimate to the estimate Unweighted count The number of units used to compute the estimate Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects Square root of design effect This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects Cumulative values The cumulative estimate through each value of the variable Test of equal cell proportions This produces chi square and likelihood ratio tests of the hypothesis that the categories of a variable have equal frequencies Separate tests are performed for each variable Complex Samples Missing Values Figure 5 3 Missing Values dialog box E Complex Samples Crosstabs Missing Values r rTables O Use all available data table by table deletion O Use consistent case base listwise deletion rCategorical
202. ment WOR See the type descriptions for more information Note that some probability proportional to size PPS types are available only when clusters have been defined and that all PPS types are available only in the first stage of a design Moreover WR methods are available only in the last stage of a design Simple Random Sampling Units are selected with equal probability They can be selected with or without replacement Simple Systematic Units are selected at a fixed interval throughout the sampling frame or strata if they have been specified and extracted without replacement A randomly selected unit within the first interval is chosen as the starting point Simple Sequential Units are selected sequentially with equal probability and without replacement m PPS This is a first stage method that selects units at random with probability proportional to size Any units can be selected with replacement only clusters can be sampled without replacement 9 Sampling from a Complex Design m PPS Systematic This is a first stage method that systematically selects units with probability proportional to size They are selected without replacement m PPS Sequential This is a first stage method that sequentially selects units with probability proportional to cluster size and without replacement m PPS Brewer This is a first stage method that selects two clusters from each stratum with probability proportional to cluster siz
203. mples Sampling Wizard 93 Obtaining a Sample from a Full Sampling Frame 2 2 200000 cece eee 93 Usingithe Wizard ia it Fie Dede neha dodanie Ga Bad ded 93 Plan SUMMArY 1 ee en E E E ene RE i 103 Sampling SUMMA scsi ese seer eka Gh dew a dea ded e eed a A e da hed oo cals 103 Sample RESUS oo daa Gh li il taaan anced 104 Obtaining a Sample from a Partial Sampling Frame 1 ee o 105 Using the Wizard to Sample from the First Partial Frame o ococococo ooooo 105 Sample Results oooooooo oo 118 Using the Wizard to Sample from the Second Partial Frame oooo oo ooooo 118 Sample Results o oooooo oo 123 Sampling with Probability Proportional to Size PPS 0 0 cece eee ee 123 Using the Wizard ooooococcccc tt eee eens 123 Plan SUMMA NY ri chee Pe Pd eas a dd ee 135 Sampling SUMMANY se sacs is ta dea dean ida AE A AS le n 135 Sample Results te ccoo e ri baal ed a ideo 137 Related Procedures 22 00 2405 cec04 da a o al cd dd bea eee eed we 139 14 Complex Samples Analysis Preparation Wizard 140 Using the Complex Samples Analysis Preparation Wizard to Ready NHIS Public Data 140 Using the Wizard coses coe rip AA an ad aaa a Vs 140 SUMMANY ii a A a de Ba Dad A 143 viii Preparing for Analysis When Sampling Weights Are Not in the Data File 143 Computing Inclusion Probabilities and Sampling Weights 2 000 143 Using the Wizard
204. mplex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored Use defautt file based on name of plan file Oan open dataset demo_cs sav DataSet1 O Custom file File Browse Browse to and select demo csplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Click Continue 167 Figure 17 2 Crosstabs dialog box Variables Region region Province province amp District district amp City city amp Subdivision subdivi Unit unit E Age in years age E Marital status marital 8 Years at current ad Household income i L Price of primary ve al Primary vehicle pric Level of education 4 Years with current Retired retire all Years with current 8 Gender gender a Select Newspaper subscription as a row variable 8E Job satisfaction job 3 n Each combination of categories Column E Response response Subpopulations defines a subpopulation Select Response as a column variable Complex Samples Crosstabs There is also some interest in seeing the results broken down by income categories so select Income category in thousa
205. n interaction term its odds ratio as reported in these tables will also depend on the values of the other predictors that make up the interaction Summary Using the Complex Samples Logistic Regression Procedure you have constructed a model for predicting the probability that a given customer will default on a loan A critical issue for loan officers is the cost of Type I and Type II errors That is what is the cost of classifying a defaulter as a nondefaulter Type I What is the cost of classifying a nondefaulter as a defaulter Type II If bad debt is the primary concern then you want to lower your Type I error and maximize your sensitivity If growing your customer base is the priority then you want to lower your Type II error and maximize your specificity Usually both are major concerns so you have to choose a decision rule for classifying customers that gives the best mix of sensitivity and specificity Related Procedures The Complex Samples Logistic Regression procedure is a useful tool for modeling a categorical variable when the cases have been drawn according to a complex sampling scheme m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when you are analyzing the sample obtained according to that plan m The Complex Samples Analys
206. n to set lower and upper bounds on the number of units sampled 11 Sampling from a Complex Design Define Unequal Sizes Figure 2 5 Define Unequal Sizes dialog box E Define Unequal Sizes E Size Specifications Exclude 2 The Define Unequal Sizes dialog box allows you to enter sizes on a per stratum basis Size Specifications grid The grid displays the cross classifications of up to five strata or cluster variables one stratum cluster combination per row Eligible grid variables include all stratification variables from the current and previous stages and all cluster variables from previous stages Variables can be reordered within the grid or moved to the Exclude list Enter sizes in the rightmost column Click Labels or Values to toggle the display of value labels and data values for stratification and cluster variables in the grid cells Cells that contain unlabeled values always show values Click Refresh Strata to repopulate the grid with each combination of labeled data values for variables in the grid Exclude To specify sizes for a subset of stratum cluster combinations move one or more variables to the Exclude list These variables are not used to define sample sizes 12 Chapter 2 Sampling Wizard Output Variables Figure 2 6 Sampling Wizard Output Variables step EH Sampling Wizard E Stage 1 Output Variables In this panel you can choose variables to be sav
207. ncy 10 000 miles year 10816 349 9 2 10 14 999 miles year 32539 364 27 5 15 19 999 miles year 39179 814 33 2 20 29 999 miles year 25617 804 21 7 gt 30 000 miles year 6595 532 5 6 Population Size 118186 000 100 0 a Dependent variable values are sorted in ascending order Given the observed data the null model that is one without predictors would classify all customers into the modal group Agree Thus the null model would be correct 27 3 of the time 203 Complex Samples Ordinal Regression Figure 21 10 Classification table Predicted Observed agree Agree Disagree disagree Correct Strongly agree 7067 567 12130 814 3875 825 2058 750 28 1 Agree 4271 234 14464 286 7320 767 6205 137 44 8 Disagree 2024 816 11703 368 7108 487 8640 746 241 Strongly disagree 889 869 8169 109 6946 522 15308 703 48 9 Overall Percent 121 39 3 21 4 27 3 37 2 Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit The classification table shows the practical results of using the model For each case the predicted response is the response category with the highest model predicted probability Cases are weighted by Final Sampling Weight so that the classification table reports the expected model performance in the population m Cells on the diagonal are correct predictions m Cells off the diagonal are incorrect predictions
208. nd Function Description amp Violent second crim a amp Second conviction it amp Date of second arr I E Inclusion Selection E Cumulative Samplin ha Inclusion Selection Display am E Cumulative Samplin m na coe rOperators and Numbers The Define Time Dependent Predictor dialog box allows you to create a predictor that is dependent upon the built in time variable T_ You can use this variable to define time dependent covariates in two general ways If you want to estimate an extended Cox regression model that allows nonproportional hazards you can do so by defining your time dependent predictor as a function of the time variable 7_ and the covariate in question A common example would be the simple product of the time variable and the predictor but more complex functions can be specified as well Some variables may have different values at different time periods but aren t systematically related to time In such cases you need to define a segmented time dependent predictor which can be done using logical expressions Logical expressions take the value 1 if true and 0 if false Using a series of logical expressions you can create your time dependent predictor from a set of measurements For example if you have blood pressure measured once a week for the four weeks of your study identified as BP1 to BP4 you can define your time dependent predictor as T_ lt 1 BP1
209. nd log transformed customer spending variables have been removed and replaced by standardized log transformed customer spending variables telco_missing sav This data file is a subset of the telco sav data file but some of the demographic data values have been replaced with missing values testmarket sav This hypothetical data file concerns a fast food chain s plans to add a new item to its menu There are three possible campaigns for promoting the new product so the new item is introduced at locations in several randomly selected markets A different promotion is used at each location and the weekly sales of the new item are recorded for the first four weeks Each case corresponds to a separate location week testmarket_1month sav This hypothetical data file is the testmarket sav data file with the weekly sales rolled up so that each case corresponds to a separate location Some of the variables that changed weekly disappear as a result and the sales recorded is now the sum of the sales during the four weeks of the study tree_car sav This is a hypothetical data file containing demographic and vehicle purchase price data tree_credit sav This is a hypothetical data file containing demographic and bank loan history data tree_missing_data sav This is a hypothetical data file containing demographic and bank loan history data with a large number of missing values tree_score_car sav This is a hypothetical data file containing
210. ndent Predictors ES gt Select age as a covariate gt Click the Statistics tab 225 Complex Samples Cox Regression Figure 22 18 Cox Regression dialog box Predictors tab E Complex Samples Cox Regression X Time end Evert Predictors Subgroups Model Statistics Pots Hypothesis Tests Save Export Options Fi Sample design information F Event and censoring summary _ Risk set at event times Parameters M Estimate Covariances of parameter estimates ia Exponentiated estimate A Correlations of parameter estimates Standard error Design effect Confidence interval Square root of design effect test Model Assumptions _ Test of proportional hazards Time Function lLog vv Parameter estimates for alternative model E Covariance matrix for alternative model Baseline survival and cumulative hazard functions CoC Leste Beset cancer Crier Select Estimate Standard error Confidence interval and Design effect in the Parameters group gt Deselect Test of proportional hazards and Parameter estimates for alternative model in the Model Assumptions group Click OK Tests of Model Effects Figure 22 19 Tests of model effects Wald F 1000 temo ois oa Lage 1 000 16 000 29 924 5 136E 5 Survival Time Variable Time to second arrest Event Status Variable Second arrest 1 Model age t_
211. nds as a subpopulation variable Click Statistics 168 Chapter 17 Figure 17 3 Crosstabs Statistics dialog box E Complex Samples Crosstabs Statistics E rCells F Population size Column percent iM Row percent Table percent r Statistics M Standard error E Unweighted count Confidence interval Design effect Level 95 Square root of design effect Coefficient of variation B Residuals A Expected values A Adjusted residuals Summaries for 2 by 2 Tables IM Odds ratio Risk difference Y Relative risk E Test of independence of rows and columns a et Deselect Population size and select Row percent in the Cells group Select Odds ratio and Relative risk in the Summaries for 2 by 2 Tables group Click Continue Click OK in the Complex Samples Crosstabs dialog box These selections produce a crosstabulation table and risk estimate for Newspaper subscription by Response Separate tables with results split by Income category in thousands are also created Crosstabulation Figure 17 4 Crosstabulation for newspaper subscription by response Newspaper subscription ves no total Yes within Newspaper Estimate 17 2 82 8 100 0 subscription Standard Error 1 0 1 0 0 0 within Newspaper Estimate 10 3 89 7 100 0 Total within Newspaper Estimate 12 8 87 2 100 0
212. nell R square that adjusts the scale of the statistic to cover the full range from 0 to 1 m McFadden s R2 McFadden 1974 is another version based on the log likelihood kernels for the intercept only model and the full estimated model What constitutes a good R value varies between different areas of application While these statistics can be suggestive on their own they are most useful when comparing competing models for the same data The model with the largest R statistic is best according to this measure Tests of Model Effects Figure 21 7 Tests of model effects Adjusted Sequential Source df df2 Wald F Sig Sidak Sig agecat gender 046 votelast O76 drivefreq 228 015 Dependent Variable The legislature should enact a gas tax Ascending Model Threshold agecat gender votelast drivefreq Link function Logit 201 Complex Samples Ordinal Regression Each term in the model is tested for whether its effect equals 0 Terms with significance values less than 0 05 have some discernable effect Thus agecat and drivefreg contribute to the model while the other main effects do not In a further analysis of the data you would consider removing gender and votelast from the model Parameter Estimates The parameter estimates table summarizes the effect of each predictor While interpretation of the coefficients in this model is difficult due to the nature of the link function the signs of the coeff
213. nfidence interval can be calculated in three ways in original units via a log transformation or a log minus log transformation Only the log minus log transformation guarantees that the bounds of the confidence interval will lie between 0 and 1 but the log transformation generally seems to perform best User Missing Values All variables must have valid values for a case to be included in the analysis These controls allow you to decide whether user missing values are treated as valid among categorical models including factors event strata and subpopulation variables and sampling design variables Confidence interval This is the confidence interval level used for coefficient estimates exponentiated coefficient estimates survival function estimates and cumulative hazard function estimates Specify a value greater than or equal to 0 and less than 100 CSCOXREG Command Additional Features The command language also allows you to Perform custom hypothesis tests using the CUSTOM subcommand and PRINT LMATRIX Tolerance specification using CRITERIA SINGULAR General estimable function table using PRINT GEF Multiple predictor patterns using multiple PATTERN subcommands Maximum number of saved variables when a rootname is specified using the SAVE subcommand The dialog honors the CSCOXREG default of 25 variables See the Command Syntax Reference for complete syntax information Part Il Examples Chapter
214. ns displayed in the same table or in separate tables Chapter 7 Complex Samples Crosstabs The Complex Samples Crosstabs procedure produces crosstabulation tables for pairs of selected variables and displays two way statistics Optionally you can request statistics by subgroups defined by one or more categorical variables Example Using the Complex Samples Crosstabs procedure you can obtain cross classification statistics for smoking frequency by vitamin usage of U S citizens based on the results of the National Health Interview Survey NHIS and with an appropriate analysis plan for this public use data Statistics The procedure produces estimates of cell population sizes and row column and table percentages plus standard errors confidence intervals coefficients of variation expected values design effects square roots of design effects residuals adjusted residuals and unweighted counts for each estimate The odds ratio relative risk and risk difference are computed for 2 by 2 tables Additionally Pearson and likelihood ratio statistics are computed for the test of independence of the row and column variables Data Row and column variables should be categorical Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples P
215. o be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw r Welcome Stage1 Design Variables Method Sample Size Output Variables Summary Add Stage 2 J Draw Sample Selection Options Output Files Completion Value The size value applies Bb E to each stratum Unequal values for strata Read values from variable 05 55 rire Crer This step allows you to specify the number or proportion of units to sample within the current stage The sample size can be fixed or it can vary across strata For the purpose of specifying sample size clusters chosen in previous stages can be used to define strata Units You can specify an exact sample size or a proportion of units to sample Value A single value is applied to all strata If Counts is selected as the unit metric you should enter a positive integer If Proportions is selected you should enter a non negative value Unless sampling with replacement proportion values should also be no greater than 1 Unequal values for strata Allows you to enter size values on a per stratum basis via the Define Unequal Sizes dialog box Read values from variable Allows you to select a numeric variable that contains size values for strata If Proportions is selected you have the optio
216. o the mean of all of the levels grand mean The levels of the factor can be in any order Difference Compares the mean of each level except the first to the mean of previous levels They are sometimes called reverse Helmert contrasts m Helmert Compares the mean of each level of the factor except the last to the mean of subsequent levels m Repeated Compares the mean of each level except the last to the mean of the subsequent level Polynomial Compares the linear effect quadratic effect cubic effect and so on The first degree of freedom contains the linear effect across all categories the second degree of freedom the quadratic effect and so on These contrasts are often used to estimate polynomial trends Reference Category The simple and deviation contrasts require a reference category or factor level against which the others are compared Complex Samples General Linear Model Save Figure 9 6 General Linear Model Save dialog box E Complex Samples General Linear Model Save pSave Variables E Predicted Values E Residuals Export Model as PASW Statistics Data Parameter estimates and covariance matrix Parameter estimates and correlation matrix Export Model as XML e mme O Parameter estimates and covariance matrix Parameter estimates only 6 ee ise Save Variables This group allows you to save the model predicted values
217. o3 0 3 0 4 lets Lelo oilor How Many 5 gt Select More than one variable group to restructure Type 6 as the number of groups gt Click Next 231 Figure 22 25 Restructure Data Wizard Variables to Cases Select Variables step each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File Post event preventative sur amp Post event rehabilitation re Length of stay for rehabilitat E Total treatment and rehabilit First event post attack eve E Time to first event post atta al History of myocardial infarct dll History of ischemic stroke i all History of hemorrhagic stro amp Second event post attack e E Time to second event post dl History of myocardial infarct dl History of ischemic stroke i all History of hemorrhagic stro amp Third event post attack eve E Time to third event post atta start time2 start_time3 Variables to Cases Select Variables Complex Samples Cox Regression For each variable group you have in the current data the restructured file will have one target variable In this step choose how to identify case groups in the restructured data and choose which variables belong with Case Group Identification Vera Wariables to be Transposed Target Variable amp First event post attack event1 amp Second event post atta
218. obesity End Variable amp History of diabetes diabetes amp Blood pressure bp amp Atrial fibrillation af amp Smoker smoker Cholesterol choles Status Variable History of angina angina amp Prescribed nitroglycerin nitro Velues indicating that event Death amp Taking anti clotting drugs anticlot has occurrec History of transient ischemic attack tia Define Event E Time to hospital time al Initial Rankin score rankin0 Subject Identifier amp CAT scan reaut oatscani amp Clot dissolving drugs clotsolw all Treatment result result Choose a subject identifier variable if there are multiple cases per amp Post event preventative surgery sur esas amp Post event rehabilitation rehab Select Patient ID patid as the subject identifier gt Click the Predictors tab 251 Complex Samples Cox Regression Figure 22 45 Cox Regression dialog box Predictors tab il Factors Blood pressure bp amp Atrial fibrillation af amp Smoker smoker amp Cholesterol choles amp History of angina angina amp Prescribed nitroglycerin nitro Taking anti clotting drugs anticlot amp History of transient ischemic attack tia Time to hospital time al Initial Rankin score rankin0 amp CAT scan result catscan amp Clot dissolving drugs clotsolw ES Covariates all Treatment result result Post event preventative
219. observed value of the predictor associated with the model parameter and the expected value of the predictor for cases in the risk set at the observed event time Schoenfeld residuals can be used to help assess the proportional hazards assumption for example for a predictor x plots of the Schoenfeld residuals for the time dependent predictor x In 7_ versus time should show a horizontal line at 0 if proportional hazards holds A separate variable is saved for each nonredundant parameter in the model Schoenfeld residuals are only computed for uncensored cases Martingale residual For each case the martingale residual is the difference between the observed censoring 0 if censored 1 if not and the expectation of an event during the observation time Deviance residual Deviance residuals are martingale residuals adjusted to appear more symmetrical about 0 Plots of deviance residuals against predictors should reveal no patterns Cox Snell residual For each case the Cox Snell residual is the expectation of an event during the observation time or the observed censoring minus the martingale residual Score residual For each case and each nonredundant parameter in the model the score residual is the contribution of the case to the first derivative of the pseudo likelihood A separate variable is saved for each nonredundant parameter in the model DFBeta residual For each case and each nonredundant parameter in the model the DFBeta resi
220. ocardial infarct dl History of ischemic stroke i Target Variable dll History of hemorrhagic stro Second event post attack e E Time to second event post al History of myocardial infarct dll History of ischemic stroke i all History of hemorrhagic stro amp Third event post attack eve Fixed Variable s E Time to third event post atta start_time2 start_time3 Wariables to be Transposed Le Time to third event post attack time3 Type time_to_event as the target variable gt Select Time to first event post attack time1 Time to second event post attack time2 and Time to third event post attack time3 as variables to be transposed Select trans4 from the target variable list 234 Chapter 22 Figure 22 28 Restructure Data Wizard Variables to Cases Select Variables step Variables to Cases Select Variables For each variable group you have in the current data the restructured file will have one target variable Inthis step choose how to identify case groups in the restructured data and choose which variables belong with each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File amp Cholesterol choles Case Group Identification dl History of myocardial infarct all History of ischemic stroke is ry Variable da Patient ID patid dl History of hemorrhagic
221. od h Sample Size i Output Variables i Add e3 Draw Sample Selection Options Output Files Completion incomplete section Select Neighborhood as a stratification variable Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each neighborhood of the townships drawn in stage 1 In this stage voters are drawn as the primary sampling unit using simple random sampling without replacement 130 Chapter 13 Figure 13 38 Sampling Wizard Sample Size step stage 2 Stage 2 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw Welcome f Siege 1 oter ID voteid Units Proportions Design Variables Method Value Sample Size The size value applies Output Variables to each stratum H Summary i Stage 2 O Unequal values for strata Design Variables Define Method gt Sample Size O Read values from variable Summary 5 Add Stage 3 Minimum Maximum J _ Draw Sample Count C Count O i Selection Options Output Files Completion Select Proportions from the Units drop down list gt Type 0 2 as the value of the proportion of units to sample from each s
222. odel tab E Complex Samples Cox Regression X Time and Evert Predictors Subgroups Model Statistics Plots Hypothesis Tests Save Export Options rSpecity Model Effects O Main effects Custom Factors and Covariates Build Term s Type Specify Model Effects By default the procedure builds a main effects model using the factors and covariates specified in the main dialog box Alternatively you can build a custom model that includes interaction effects and nested terms Non Nested Terms For the selected factors and covariates Interaction Creates the highest level interaction term for all selected variables Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables 82 Chapter 12 Nested Terms You can build nested terms for your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the C
223. of parameter estimates IM Confidence interval X Design effect E t test F Square root of design effect Model Fit iv Population means of dependent variable and covariates Sample design information Gente canoa tio gt Select Estimate Standard error Confidence interval and Design effect in the Model Parameters group gt Click Continue Click Estimated Means in the General Linear Model dialog box Figure 19 5 General Linear Model Estimated Means dialog box Reference Category 3 shoptortusecoup 2 From newspaper 3 From mailings 4 From both Display mean for overall population gt Choose to display means for shopfor usecoup and the shopfor usecoup interaction 181 Complex Samples General Linear Model gt Select a Simple contrast and 3 Self and family as the reference category for shopfor Note that once selected the category appears as 3 in the dialog box Select a Simple contrast and 1 No as the reference category for usecoup Click Continue Click OK in the General Linear Model dialog box Model Summary Figure 19 6 R square statistic R Square 601 a Model Amount spent Intercept shopfor usecoup shopfor usecoup R square the coefficient of determination is a measure of the strength of the model fit It shows that about 60 of the variation in Amount spent is explained by the model which gives you goo
224. of screws bolts nuts and tacks Hartigan 1975 shampoo_ph sav This is a hypothetical data file that concerns the quality control at a factory for hair products At regular time intervals six separate output batches are measured and their pH recorded The target range is 4 5 5 5 ships sav A dataset presented and analyzed elsewhere McCullagh et al 1989 that concerns damage to cargo ships caused by waves The incident counts can be modeled as occurring at a Poisson rate given the ship type construction period and service period The aggregate months of service for each cell of the table formed by the cross classification of factors provides values for the exposure to risk site sav This is a hypothetical data file that concerns a company s efforts to choose new sites for their expanding business They have hired two consultants to separately evaluate the sites who in addition to an extended report summarized each site as a good fair or poor prospect smokers sav This data file is abstracted from the 1998 National Household Survey of Drug Abuse and is a probability sample of American households http dx doi org 10 3886 ICPSR02934 Thus the first step in an analysis of this data file should be to weight the data to reflect population trends 265 Sample Files stroke_clean sav This hypothetical data file contains the state of a medical database after it has been cleaned using procedures in the Data
225. of the model with nonparallel lines Summary statistics for model variables Displays summary information about the dependent variable covariates and factors Sample design information Displays summary information about the sample including the unweighted count and the population size Complex Samples Hypothesis Tests Figure 11 5 Hypothesis Tests dialog box E Complex Samples Logistic Regression Hypothesis Tests Test Statistic Sampling Degrees of Freedom E Based on sample design Adjusted F Fixed Chi square O Adjusted Chi square Adjustment for Multiple Comparisons Least significant difference Sequential Sidak Sequential Bonferroni O Sidak Bonferroni Ci ce ne 70 Chapter 11 Test Statistic This group allows you to select the type of statistic used for testing hypotheses You can choose between F adjusted F chi square and adjusted chi square Sampling Degrees of Freedom This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics If based on the sampling design the value is the difference between the number of primary sampling units and the number of strata in the first stage of sampling Alternatively you can set a custom degrees of freedom by specifying a positive integer Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall sign
226. olute or relative change in the parameter estimates is less than the value specified which must be non negative Limit iterations based on change in log likelihood When selected the algorithm stops after an iteration in which the absolute or relative change in the log likelihood function is less than the value specified which must be non negative m Check for complete separation of data points When selected the algorithm performs tests to ensure that the parameter estimates have unique values Separation occurs when the procedure can produce a model that correctly classifies every case Display iteration history Displays parameter estimates and statistics at every n iterations beginning with the 0th iteration the initial estimates If you choose to print the iteration history the last iteration is always printed regardless of the value of n User Missing Values Scale design variables as well as the dependent variable and any covariates should have valid data Cases with invalid data for any of these variables are deleted from the analysis These controls allow you to decide whether user missing values are treated as valid among the strata cluster subpopulation and factor variables Confidence Interval This is the confidence interval level for coefficient estimates exponentiated coefficient estimates and odds ratios Specify a value greater than or equal to 50 and less than 100 CSORDINAL Command Additional Features The c
227. ommand syntax language also allows you to m Specify custom tests of effects versus a linear combination of effects or a value using the CUSTOM subcommand m Fix values of other model variables at values other than their means when computing cumulative odds ratios for factors and covariates using the ODDSRATIOS subcommand m Use unlabeled values as custom reference categories for factors when odds ratios are requested using the ODDSRATIOS subcommand m Specify a tolerance value for checking singularity using the CRITERIA subcommand m Produce a general estimable function table using the PRINT subcommand m Save more than 25 probability variables using the SAVE subcommand See the Command Syntax Reference for complete syntax information Chapter Complex Samples Cox Regression The Complex Samples Cox Regression procedure performs survival analysis for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Examples A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction One of the measures of recidivism is the time until second arrest for offenders The agency would like to model time to rearrest using Cox Regression but are worried the proportional hazards assumption is invalid across age categories Medical researchers are investigating survival times for patients exiting a rehabilitation program post ischemic stroke There is
228. omplex design where values further from indicate greater effects Square root of design effect This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects Complex Samples Descriptives Missing Values Figure 6 3 Descriptives Missing Values dialog box Statistics for Measure Variables EH Complex Samples Descriptives Missing Values Ensure consistent case base listwise deletion Categorical Design Variables User missing values are valid User missing values are invalid Cases with invalid data for any categorical design variable are excluded from the analysis Statistics for Measure Variables This group determines which cases are used in the analysis Use all available data Missing values are determined on a variable by variable basis thus the cases used to compute statistics may vary across measure variables Ensure consistent case base Missing values are determined across all variables thus the cases used to compute statistics are consistent Categorical Design Variables This group determines whether user missing values are valid or invalid 36 Chapter 6 Complex Samples Options Figure 6 4 Options dialog box E Complex Samples Crosstabs Options E Subpopulation Display o Allin the same table Each in a separate table SEEN Subpopulation Display You can choose to have subpopulatio
229. ompute based on transformed survival Minimum Change 0 000001 Type Relative tunction then back transtorm to original units Transformation log Limit iterations based on change in log likelihood O Compute based on original units of survival function Minimum Change Display iteration history pUser Missing Values Increment i 9 Treat as invalid Tie breaking method for parameter estimation O Treat as valid Efron This setting applies to all categorical model and Breslow sample design variables Confidence interval Coc eeste eset cancer e Estimation These controls specify criteria for estimation of regression coefficients Maximum Iterations The maximum number of iterations the algorithm will execute Specify a non negative integer Maximum Step Halving At each iteration the step size is reduced by a factor of 0 5 until the log likelihood increases or maximum step halving is reached Specify a positive integer Limit iterations based on change in parameter estimates When selected the algorithm stops after an iteration in which the absolute or relative change in the parameter estimates is less than the value specified which must be positive Limit iterations based on change in log likelihood When selected the algorithm stops after an iteration in which the absolute or relative change in the log likelihood function is less than the value specified whic
230. on of age category and gender adl sav This is a hypothetical data file that concerns efforts to determine the benefits of a proposed type of therapy for stroke patients Physicians randomly assigned female stroke patients to one of two groups The first received the standard physical therapy and the second received an additional emotional therapy Three months following the treatments each patient s abilities to perform common activities of daily life were scored as ordinal variables advert sav This is a hypothetical data file that concerns a retailer s efforts to examine the relationship between money spent on advertising and the resulting sales To this end they have collected past sales figures and the associated advertising costs aflatoxin sav This is a hypothetical data file that concerns the testing of corn crops for aflatoxin a poison whose concentration varies widely between and within crop yields A grain processor has received 16 samples from each of 8 crop yields and measured the alfatoxin levels in parts per billion PPB anorectic sav While working toward a standardized symptomatology of anorectic bulimic behavior researchers Van der Ham Meulman Van Strien and Van Engeland 1997 made a study of 55 adolescents with known eating disorders Each patient was seen four times over four years for a total of 220 observations At each observation the patients were scored for each of 16 symptoms Symptom scores are missing
231. on of categories defines a subpopulation Complex Samples Ratios Optionally you can specify variables to define subgroups for which statistics are produced Complex Samples Ratios Statistics Figure 8 2 Ratios Statistics dialog box E Complex Samples Ratios Statistics E Statistics Y Standard error IM Confidence interval Level E Coefficient of variation Y Unweighted count Population size Design effect Square root of design effect Y ttest Test value contre cance ner Statistics This group produces statistics associated with the ratio estimate Standard error The standard error of the estimate Confidence interval A confidence interval for the estimate using the specified level Coefficient of variation The ratio of the standard error of the estimate to the estimate Unweighted count The number of units used to compute the estimate Population size The estimated number of units in the population 44 Chapter 8 m Design effect The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects m Square root of design effect This is a measure of the effect of specifying a complex design where values further from 1 indicate greater effects T test You can re
232. on probabilities and cumulative sampling weights for each stage plus the final sampling weights for the first two stages m Cities with values for these variables were selected to the sample m Cities with system missing values for the variables were not selected For each city selected the company acquired subdivision and household unit information and placed it in demo_cs_2 sav Use this file and the Sampling Wizard to sample the third stage of this design Using the Wizard to Sample from the Second Partial Frame gt Torun the Complex Samples Sampling Wizard from the menus choose Analyze gt Complex Samples gt Select a Sample 119 Figure 13 27 Sampling Wizard Welcome step Welcome to the Sampling Wizard time to indicate how the data were sampled Complex Samples Sampling Wizard The Sampling Wizard helps you design and select a complex sample Your selections will be saved to a plan file that you can use at analysis You can also use the wizard to modify a sampling plan or draw a sample according to an existing plan What would you like to do O Design a sample Choose this option if you have not created a plan file You will have the option to draw the sample N O Edit a sample design Choose this option if you want to add remove or modify stages of an existing plan You will have the option to draw the sample a A O Draw the sample Choose this option if you already have a plan file
233. on sampling units for the second stage and then modify the sampling plan to include the second stage 3 Introduction to Complex Samples Procedures m An analyst who doesn t have access to the sampling plan file can specify an analysis plan and refer to that plan from each Complex Samples analysis procedure m A designer of large scale public use samples can publish the sampling plan file which simplifies the instructions for analysts and avoids the need for each analyst to specify his or her own analysis plans Further Readings For more information on sampling techniques see the following texts Cochran W G 1977 Sampling Techniques 3rd ed New York John Wiley and Sons Kish L 1965 Survey Sampling New York John Wiley and Sons Kish L 1987 Statistical Design for Research New York John Wiley and Sons Murthy M N 1967 Sampling Theory and Methods Calcutta India Statistical Publishing Society S rndal C B Swensson and J Wretman 1992 Model Assisted Survey Sampling New York Springer Verlag Chapter Sampling from a Complex Design Figure 2 1 Sampling Wizard Welcome step Welcome to the Sampling Wizard The Sampling Wizard helps you design and select a complex sample Your selections will be saved to a plan file that you can use at analysis time to indicate how the data were sampled You can also use the wizard to modify a sampling plan or draw a sample according to an existing plan What
234. on the Predictors tab are included in the model this option is not available 84 Chapter 12 Plots Figure 12 8 Cox Regression dialog box Plots tab El Complex Samples Cox Regression X Time and Event Predictors Subgroups Model Statistics PIS Hypothesis Tests Save Export Options Plots Survival function Log minus log of survival function Hazard function One minus survival function E Display confidence intervals in selected plots Plot Factors at Factor Level Separate Lines Marital status Highest level l Social status 20 Level of education Highest level Plot Covariates at Covariate Value Age in years Mean By default covariates in the model are evaluated at their means and factors in the model are evaluated at their highest levels You can change the value at which any model predictor is evaluated and plot separate lines for each level of one factor variable Cox taste eset cancer eo The Plots tab allows you to request plots of the hazard function survival function log minus log of the survival function and one minus the survival function You can also choose to plot confidence intervals along the specified functions the confidence level is set on the Options tab Predictor patterns You can specify a pattern of predictor values to be used for the requested plots and the export
235. ond arrest arre should be a variable name in the active dataset all Severity of second amp Violent second crim amp Second conviction K Date of second arr Inclusion Selection a Select Include if case satisfies condition Type MISSING date2 as the expression Click Continue Click OK in the Compute Variable dialog box v v v vv Next to compute the time between first and second arrest from the menus choose Transform gt Date and Time Wizard 213 Complex Samples Cox Regression Figure 22 3 Date and Time Wizard Welcome step Welcome to the date and time wizard What would you like to do Learn how dates and times are represented in PASW Statistics Create a datetime variable from a string containing a date or time Create a datetime variable from variables holding parts of dates or times Calculate with dates and times O Extract a part of a date or time variable O Assign periodicity to a dataset for time series data This ends the wizard and opens the Define Dates dialog box 5 Gale gt Select Calculate with dates and times gt Click Next 214 Chapter 22 Figure 22 4 Date and Time Wizard Do Calculations on Dates step Do Calculations on Dates Choose one of the following tasks and press Next O Add or subtract a duration from a date e 9 add a month to an age or add a time variable to a datetime variable Calculate the number of t
236. orrespondence analysis to categorical data in market research Journal of Targeting Measurement and Analysis for Marketing 5 56 70 Kish L 1965 Survey Sampling New York John Wiley and Sons Kish L 1987 Statistical Design for Research New York John Wiley and Sons McCullagh P and J A Nelder 1989 Generalized Linear Models 2nd ed London Chapman amp Hall McFadden D 1974 Conditional logit analysis of qualitative choice behavior In Frontiers in Economics P Zarembka ed New York AcademicPress Murthy M N 1967 Sampling Theory and Methods Calcutta India Statistical Publishing Society Nagelkerke N J D 1991 A note on the general definition of the coefficient of determination Biometrika 78 3 691 692 Price R H and D L Bouffard 1974 Behavioral appropriateness and situational constraints as dimensions of social behavior Journal of Personality and Social Psychology 30 579 586 Rickman R N Mitchell J Dingman and J E Dalen 1974 Changes in serum cholesterol during the Stillman Diet Journal of the American Medical Association 228 54 58 Copyright SPSS Inc 1989 2010 269 270 Bibliography Rosenberg S and M P Kim 1975 The method of sorting as a data gathering procedure in multivariate research Multivariate Behavioral Research 10 489 502 S rndal C B Swensson and J Wretman 1992 Model Assisted Survey Sampling New York Springer Verlag Van der
237. os Some of the confidence intervals do not overlap thus you can conclude that the ratios for the Western county are higher than the ratios for the Northern and Southern counties m Finally as a more objective measure note that the significance values of the tests for the Western and Southern counties are less than 0 05 Thus you can conclude that the ratio for the Western county is greater than 1 3 and the ratio for the Southern county is less than 1 3 Summary Using the Complex Samples Ratios procedure you have obtained various statistics for the ratios of Current value to Value at last appraisal The results suggest that there may be certain inequities in the assessment of property taxes from county to county namely m The ratios for the Western county are high indicating that their records are not as up to date as other counties with respect to the appreciation of property values Property taxes are probably too low in this county m The ratios for the Southern county are low indicating that their records are more up to date than the other counties with respect to the appreciation of property values Property taxes are probably too high in this county m The ratios for the Southern county are lower than those of the Western county but are still within the objective goal of 1 3 Resources used to track property values in the Southern county will be reassigned to the Western county to bring these counties ratios in line wi
238. ou have sample File Browse data but have not created a plan file ihis2000_subset csaplan Edit a plan file Choose this option if you want to add remove or modify stages of an File Browse existing plan If you already have a plan file you can skip the Analysis Preparation Wizard and go directly to any of the analysis procedures in the Complex Samples Option to analyze your sample oD Gees ae gt Browse to where you want to save the plan file and type nhis2000_subset csaplan as the name for the analysis plan file gt Click Next 142 Chapter 14 Figure 14 2 Analysis Preparation Wizard Design Variables step stage 1 reparation Wizar Stage 1 Design Variables In this panel you can select variables that define strata or clusters 4 sample weight variable must be selected in the first stage You can also provide a label for the stage that will be used in the output Welcome Variables Stage 1 E Sex SEX E Stratum for variance estimation i Design Variables Age AGE_P Estimation Method 8 Region REGION Summary 8 Smoking frequency SMKN Completion E Witamin mineral supplmnts p Take any multi vitamins in p Clusters 8 Take herbal supplements d E PSU for variance estimation PSU E Freq vigorous activity time 8 Freq moderate activity time 8 Freg strength activity times Desirable Body Weight DE Daily activities moving arou Sample Weight
239. ow types varname_ Takes values P1 P2 corresponding to an ordered list of all model parameters for row types COV or CORR with value labels corresponding to the parameter strings shown in the parameter estimates table The cells are blank for other row types P1 P2 These variables correspond to an ordered list of all model parameters with variable labels corresponding to the parameter strings shown in the parameter estimates table and take values according to the row type For redundant parameters all covariances are set to zero correlations are set to the system missing value all parameter estimates are set at zero and all standard errors significance levels and residual degrees of freedom are set to the system missing value Note This file is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Export model as XML Saves the parameter estimates and the parameter covariance matrix if selected in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes Figure 11 8 Ordinal Regression Options dialog box El Complex Samples Ordinal Regression Options rEstimation Method rEstimation Criteria O Neyrton Raphson Maximum Iterations 100 Fisher scoring O Fisher scoring then Newton Raphson Maximum Step Halving Maximum Number of
240. oyer 6 99 Years at current address 6 32 Household income in thousands 60 1581 Debt to income ratio x100 9 9341 Credit card debt in thousands 1 9764 Other debt in thousands 3 9164 This table displays the odds ratios of Previously defaulted at the factor levels of Level of education The reported values are the ratios of the odds of default for Did not complete high school through College degree compared to the odds of default for Post undergraduate degree Thus the odds ratio of 2 054 in the first row of the table means that the odds of default for a person who did not complete high school are 2 054 times the odds of default for a person who has a post undergraduate degree Figure 20 11 Odds ratios for years with current employer 95 Confidence Interval Previously Units of Change defaulted Odds Ratio Lower Upper Years with current employer 1 000 Yes 798 758 640 Dependent Variable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt a Factors and covariates used in the computation are fixed at the following values Level of education Post undergraduate degree Age in years 34 19 Years with current employer 6 99 Years at current address 6 32 Household income in thousands 60 1581 Debt to income ratio x100 9 9341 Credit card debt in thousands 1 9764 Other debt in thousands 3 9164 This table displays the odds ratio of Previously defaulted for a un
241. pletion File bankloan csaplan al Remove stages from the plan stages 2 w Ema Set This step allows you to review the analysis plan and remove stages from the plan Remove Stages You can remove stages 2 and 3 from a multistage design Since a plan must have at least one stage you can edit but not remove stage 1 from the design Chapter Complex Samples Plan Complex Samples analysis procedures require analysis specifications from an analysis or sample plan file in order to provide valid results Figure 4 1 Complex Samples Plan dialog box E Complex Samples Plan for Frequencies An mPlan File Inhis2000_subset csaplan If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored Use default file mhis2000_subset sav O an open dataset Custom file continue canci _nep_ Plan Specify the path of an analysis or sample plan file Joint Probabilities In order to use Unequal WOR estimation for clusters drawn using a PPS WOR method you need to specify a separate file or an open dataset containing the joint probabilities This file or dataset is created by the Sampling Wizard during sampl
242. plex Samples Ordinal Regression 72 odds ratios in Complex Samples Crosstabs 39 165 in Complex Samples Logistic Regression 60 193 in Complex Samples Ordinal Regression 70 203 parameter convergence in Complex Samples Logistic Regression 62 in Complex Samples Ordinal Regression 72 parameter estimates in Complex Samples Cox Regression 82 in Complex Samples General Linear Model 48 182 in Complex Samples Logistic Regression 57 192 in Complex Samples Ordinal Regression 68 201 piecewise constant time dependent predictors in Complex Samples Cox Regression 226 plan file 2 polynomial contrasts in Complex Samples General Linear Model 50 population size in Complex Samples Crosstabs 39 Index in Complex Samples Descriptives 34 in Complex Samples Frequencies 30 158 in Complex Samples Ratios 43 in Sampling Wizard 12 PPS sampling in Sampling Wizard 8 predicted categories in Complex Samples Logistic Regression 61 in Complex Samples Ordinal Regression 71 predicted probability in Complex Samples Logistic Regression 61 in Complex Samples Ordinal Regression 71 predicted values in Complex Samples General Linear Model 51 predictor patterns in Complex Samples Cox Regression 256 proportional hazards test in Complex Samples Cox Regression 222 pseudo R statistics in Complex Samples Logistic Regression 57 190 in Complex Samples Ordinal Regression 68 200 208 public data in Analysis Preparation Wizard
243. plex sampling design Using Complex Samples Ordinal Regression you can fit a model for the level of support for the bill based upon voter demographics Data The dependent variable is ordinal Factors are categorical Covariates are quantitative variables that are related to the dependent variable Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Obtaining Complex Samples Ordinal Regression From the menus choose Analyze gt Complex Samples gt Ordinal Regression Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 64 65 Complex Samples Ordinal Regression Figure 11 1 Ordinal Regression dialog box E Complex Samples Ordinal Regression E Variables Dependent Variable Model bvar id Neighborhood nbrhood Township town County county E Inclusion Selection Pr dl Age category agecat Cumulative Sampling W Gender gender Cumulative Sampling W Voted in last election v al Driving frequency driv Covariates Link Function oot Subpopulation Variable Category gt Select a dependent variable Optionally you can m Select variables for fac
244. porated Microsoft product screenshot s reprinted with permission from Microsoft Corporation Bibliography Bell E H 1961 Social foundations of human behavior Introduction to the study of sociology New York Harper amp Row Blake C L and C J Merz 1998 UCI Repository of machine learning databases Available at http www ics uci edu mlearn MLRepository html Breiman L and J H Friedman 1985 Estimating optimal transformations for multiple regression and correlation Journal of the American Statistical Association 80 580 598 Cochran W G 1977 Sampling Techniques 3rd ed New York John Wiley and Sons Collett D 2003 Modelling survival data in medical research 2 ed Boca Raton Chapman amp Hall CRC Cox D R and E J Snell 1989 The Analysis of Binary Data 2nd ed London Chapman and Hall Green P E and V Rao 1972 Applied multidimensional scaling Hinsdale Ill Dryden Press Green P E and Y Wind 1973 Multiattribute decisions in marketing A measurement approach Hinsdale Ill Dryden Press Guttman L 1968 A general nonmetric technique for finding the smallest coordinate space for configurations of points Psychometrika 33 469 506 Hartigan J A 1975 Clustering algorithms New York John Wiley and Sons Hastie T and R Tibshirani 1990 Generalized additive models London Chapman and Hall Kennedy R C Riquier and B Sharp 1996 Practical applications of c
245. prevent 2 AA amp Post event rehabilit Total treatment and Bl output amp Event Index event_ amp First event post att Filter out unselected cases 8 Length of stay for r E Time to first event p Al History of myocardi Dataset name all History of ischemic al History of hemorrha Copy selected cases to a new dataset Delete unselected cases Current Status Do not fitter cases Coc eae Beset cancer reo gt Select Delete unselected cases Click OK Creating a Simple Random Sampling Analysis Plan Now you are ready to create the simple random sampling analysis plan gt First you need to create a sampling weight variable From the menus choose Transform gt Compute Variable 243 Complex Samples Cox Regression Figure 22 37 Cox Regression main dialog box Target Variable Numeric Expression sampleweight sl 1 Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic J Functions and Special Variables l optional case selection condition Type sampleweight as the target variable Type 1 as the numeric expression Click OK You are now ready to create the analysis plan Note There is an existing plan file srs csaplan in the sample files directory that you can use if you want to skip the following instructions and proceed to analysis of the data To create the ana
246. put Files Completion Input Sample Weight lt x Gs es ue incomplete section Select County as a stratification variable Select Township as a cluster variable Click Next This design structure means that independent samples are drawn for each county In this stage townships are drawn as the primary sampling unit 126 Chapter 13 Figure 13 34 Sampling Wizard Sampling Method step stage 1 a A CUZ Stage 1 Sampling Method In this panel you can choose how to select items from the working data file If you choose a PPS probability proportional to size sampling method you must also specify a measure of size MOS Welcome Variables Method Design Variables amp Neighborhood nbrhood gt Method Without replacement WOR Sample Size O With replacement WR Output Variables Use WR estimation for analysis Add Stage 2 Measure of Size MOS Draw Sample Read from variable enna C7 Completion Count data records Minimum NA Maximum Dd incomplete section Select PPS as the sampling method gt Select Count data records as the measure of size gt Click Next Within each county townships are drawn without replacement with probability proportional to the number of records for each township Using a PPS method generates joint sampling probabilities for the townships you will specify where to save these values in the Output Fi
247. quest tests of the estimates against a specified value Complex Samples Ratios Missing Values Figure 8 3 Ratios Missing Values dialog box El Complex Samples Ratios Missing Values Ratios O Use all available data ratio by ratio deletion O Ensure consistent case base listwise deletion Categorical Design Variables User missing values are invalid User missing values are valid Cases with invalid data for any categorical design variable are excluded from the analysis EEN Ratios This group determines which cases are used in the analysis m Use all available data Missing values are determined on a ratio by ratio basis Thus the cases used to compute statistics may vary across numerator denominator pairs m Ensure consistent case base Missing values are determined across all variables Thus the cases used to compute statistics are consistent Categorical Design Variables This group determines whether user missing values are valid or invalid Complex Samples Options Figure 8 4 Options dialog box El Complex Samples Crosstabs Options X Subpopulation Display All in the same table Each in a separate table Crime cancer _He_ Subpopulation Display You can choose to have subpopulations displayed in the same table or in separate tables Chapter Complex Samples General Linear Model The Complex Samples General Linear Model CSGLM procedure performs
248. r so we will ignore the log minus log plot for the reference pattern m Patterns 1 1 through 1 4 differ only on the value of History of myocardial infarction A separate pattern and separate line in the requested plot is created for each value of History of myocardial infarction while the other variables are held constant 257 Complex Samples Cox Regression Log Minus Log Plot Figure 22 53 Log minus log plot Pattern 1 History of myocardial Infarction 5 000 None One Two Three 2 500 000 2 500 Log Minus Log Survival Function 000 200 000 400 000 600 000 800 000 1000 000 1200 0001 400 000 Survival Time This plot displays the log minus log of the survival function In In suvival versus the survival time This particular plot displays a separate curve for each category of History of myocardial infarction with History of ischemic stroke fixed at One and History of hemorrhagic stroke fixed at None and is a useful visualization of the effect of History of myocardial infarction on the survival function As seen in the parameter estimates table it appears that the survival for patients with one or no prior mi s is distinguishable from the survival for patients with two prior mi s which in turn is distinguishable from the survival for patients with three prior mi s Summary You have fit a Cox regression model for post stroke survival that estimates the effects of the changing post stroke patien
249. r your model in this procedure Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor For example a grocery store chain may follow the spending habits of its customers at several store locations Since each customer frequents only one of these locations the Customer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying A A is invalid m All factors within a nested effect must be unique Thus if A is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid 68 Chapter 11 Complex Samples Ordinal Regression Statistics Figure 11 4 Ordinal Regression Statistics dialog box E Complex Samples Ordinal Regression Statistics E Model Fit Y Pseudo R square Classification table Parameters Estimate E Covariances of parameter estimates 2 Exponentiated estimate F Correlations of parameter estimates Standard error IM Design effect Fi Confidence interval F Square root of design effect E t test
250. rded dmdata sav This is a hypothetical data file that contains demographic and purchasing information for a direct marketing company dmdata2 sav contains information for a subset of contacts that received a test mailing and dmdata3 sav contains information on the remaining contacts who did not receive the test mailing dietstudy sav This hypothetical data file contains the results of a study of the Stillman diet Rickman Mitchell Dingman and Dalen 1974 Each case corresponds to a separate subject and records his or her pre and post diet weights in pounds and triglyceride levels in mg 100 ml dvdplayer sav This is a hypothetical data file that concerns the development of a new DVD player Using a prototype the marketing team has collected focus group data Each case corresponds to a separate surveyed user and records some demographic information about them and their responses to questions about the prototype german_credit sav This data file is taken from the German credit dataset in the Repository of Machine Learning Databases Blake and Merz 1998 at the University of California Irvine grocery_1month sav This hypothetical data file is the grocery_coupons sav data file with the weekly purchases rolled up so that each case corresponds to a separate customer Some of the variables that changed weekly disappear as a result and the amount spent recorded is now the sum of the amounts spent during the four weeks of th
251. rent em j E Years at current addre Conversion o E Household income in th Current DateTime E Debt to income ratio x Date Arithmetic Credit card debt in tho Functions and Special Variables E Other debt in thousand amp Previously defaulted d ptional case selection condition Fifteen out of one hundred bank branches were selected without replacement in the first stage thus the probability that a given bank was selected is 15 100 0 15 Type inclprob_s1 as the target variable Type 0 15 as the numeric expression Click OK 145 v v v y Figure 14 5 Complex Samples Analysis Preparation Wizard Compute Variable dialog box Target Variable Branch branch Number of customers Customer ID customer 4 Age in years age all Level of education ed 8 Years with current em Debt to income ratio x Credit card debt in tho 8 Other debt in thousand Previously defaulted d inclprob_s1 inclprob_s2 8 Years at current addre E Household income in th Numeric Expression 100 ncust Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic Functions and Special Variables 1r ontionat case selection condition One hundred customers were selected from each branch in the second stage thus the stage 2 inclusion probability for a given customer at a given bank is 100
252. rom their first arrest during the month of June 2003 and records their demographic information some details of their first crime and the data of their second arrest if it occurred by the end of June 2006 Offenders were selected from sampled departments according to the sampling plan specified in recidivism_cs csplan because it makes use of a probability proportional to size PPS method there is also a file containing the joint selection probabilities recidivism_cs_jointprob sav rfm_transactions sav A hypothetical data file containing purchase transaction data including date of purchase item s purchased and monetary amount of each transaction salesperformance sav This is a hypothetical data file that concerns the evaluation of two new sales training courses Sixty employees divided into three groups all receive standard training In addition group 2 gets technical training group 3 a hands on tutorial Each employee was tested at the end of the training course and their score recorded Each case in the data file represents a separate trainee and records the group to which they were assigned and the score they received on the exam satisf sav This is a hypothetical data file that concerns a satisfaction survey conducted by a retail company at 4 store locations 582 customers were surveyed in all and each case represents the responses from a single customer screws sav This data file contains information on the characteristics
253. rrors significance levels and residual degrees of freedom are set to the system missing value Note This file is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Export survival function as SPSS Statistics data Writes a dataset in SPSS Statistics format containing the survival function standard error of the survival function upper and lower bounds of the confidence interval of the survival function and the cumulative hazards function for each failure or event time evaluated at the baseline and at the predictor patterns specified on the Plot tab The order of variables in the matrix file is as follows Baseline strata variable Separate survival tables are produced for each value of the strata variable Survival time variable The event time a separate case is created for each unique event time Sur_0 LCL_Sur_0 UCL_Sur_0 Baseline survival function and the upper and lower bounds of its confidence interval Sur_R LCL_Sur_R UCL_Sur_R Survival function evaluated at the reference pattern see the pattern values table in the output and the upper and lower bounds of its confidence interval Sur_ LCL_Sur_ UCL_Sur_ Survival function evaluated at each of the predictor patterns specified on the Plots tab and the upper and lower bounds of their confidence intervals See the pattern values table in the output to matc
254. rty taxes are fairly assessed from county to county Taxes are based on the appraised value of the property so the agency wants to track property values across counties to be sure that each county s records are equally up to date Since resources for obtaining current appraisals are limited the agency chose to employ complex sampling methodology to select properties The sample of properties selected and their current appraisal information is collected in property_assess_cs_sample sav For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use Complex Samples Ratios to assess the change in property values across the five counties since the last appraisal Running the Analysis gt Torun a Complex Samples Ratios analysis from the menus choose Analyze gt Complex Samples gt Ratios Copyright SPSS Inc 1989 2010 171 172 Chapter 18 Figure 18 1 Complex Samples Plan dialog box File property_assess csplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default file C Program Files SPSSin demo sav An open dataset property_assess_cs_sample sav DataSet3 Cus
255. s Crosstabs 39 in Complex Samples Descriptives 34 in Complex Samples Frequencies 30 in Complex Samples Ratios 43 column percentages in Complex Samples Crosstabs 39 Complex Samples hypothesis tests 49 59 69 missing values 31 40 options 32 36 41 44 Complex Samples Analysis Preparation Wizard 140 public data 140 related procedures 154 sampling weights not available 143 summary 143 154 Complex Samples Cox Regression 210 date and time variables 74 define event 77 hypothesis tests 85 Index Kaplan Meier analysis 74 log minus log plot 257 model 81 model export 88 options 90 parameter estimates 226 255 pattern values 256 piecewise constant time dependent predictors 226 plots 84 predictors 78 sample design information 221 254 save variables 86 statistics 82 subgroups 80 test of proportional hazards 222 tests of model effects 222 225 255 time dependent predictor 79 210 Complex Samples Crosstabs 37 165 crosstabulation table 168 related procedures 170 relative risk 165 169 170 statistics 39 Complex Samples Descriptives 33 160 missing values 35 public data 160 related procedures 164 statistics 34 163 statistics by subpopulation 163 Complex Samples Frequencies 29 155 frequency table 158 frequency table by subpopulation 158 related procedures 159 statistics 30 Complex Samples General Linear Model 45 176 command additional features 53
256. s __ Hypothesis Tests Save Export Options Variables Survival Time amp Region region Start of Interval Onset of Risk amp Province province O Time 0 amp District district AA amp City city z Start Variable amp Arrest ID arrest 3 E Age in years age dll Age category agecat End of Interval amp Marital status marital End Variable al Social status social TE al Level of education ed amp Employed employ amp Gender gender al Severity of first crime crime1 amp Violent first crime violent1 amp Date of release from first arrest date1 Posted bail bail amp Received rehabilitation rehab Mi Severity of second crime crime2 amp Violent second crime violent2 IS Subject Identifier indicating that event none occurred amp Second conviction convict2 amp Date of second arrest date2 Inclusion Selection Probability for St Choose a subject identifier variable if there are multiple cases per Cumulative Sampling Weight for Stag subject E Cumulative Sampling Weight for Stag gt Select Time to second arrest time_to_event as the variable defining the end of the interval Select Second arrest arrest2 as the variable defining whether the event has occurred gt Click Define Event 219 Complex Samples Cox Regression Figure 22 9 Define Event dialog box Values Indicating that Event Has Occurred O Individual values s
257. s a cluster variable gt Click Next and then click Next in the Sampling Method step This design structure means that independent samples are drawn for each region In this stage provinces are drawn as the primary sampling unit using the default method simple random sampling 108 Chapter 13 Figure 13 16 Sampling Wizard Sample Size step stage 1 Stage 1 Sample Size In this panel you can specify the number or proportion of units to be sampled in the current stage The sample size can be fixed across strata or it can vary for different strata If you specify sample sizes as proportions you can also set the minimum or maximum number of units to draw Welcome Variables Stage 1 amp District district Units Counts i Design Variables A City city Method p amp Subdivision subdivision Value gt Sample Size The size value applies Output Variables to each stratum Summary Add Stage 2 Unequal values for strata J _ Draw Sample Define Selection Options Output Files O Read values from variable Minimum Maximum Count Count gt Select Counts from the Units drop down list gt Type 3 as the value for the number of units to select in this stage Click Next and then click Next in the Output Variables step 109 Complex Samples Sampling Wizard Figure 13 17 Sampling Wizard Plan Summary step stage 1 Stage 1 Plan Summary This panel summarizes the sampling plan so far You
258. s a factor Select Age in years through Other debt in thousands as covariates Select Previously defaulted and click Reference Category 189 Figure 20 3 Logistic Regression Reference Category dialog box Complex Samples Logistic Regression Reference Category Reference Category Highest value Lowest value custom Value gt Select Lowest value as the reference category Complex Samples Logistic Regression This sets the did not default category as the reference category thus the odds ratios reported in the output will have the property that increasing odds ratios correspond to increasing probability of default gt Click Continue Click Statistics in the Logistic Regression dialog box Figure 20 4 Logistic Regression Statistics dialog box E Complex Samples Logistic Regression Statistics E pModel Fit Pseudo R square Y Classification table Parameters iM Estimate Y Exponentiated estimate Standard error IM Confidence interval test Covariances of parameter estimates Correlations of parameter estimates IM Design effect Square root of design effect Select Classification table in the Model Fit group Fi Summary statistics for model variables Fi Sample design information age e gt Select Estimate Exponentiated estimate Standard error Confidence interval and Design effect in the
259. s executed and request a case processing summary Choose a subset of variables in the active dataset to write to an external sample file or to a different dataset See the Command Syntax Reference for complete syntax information Chapter Preparing a Complex Sample for Analysis Figure 3 1 Analysis Preparation Wizard Welcome step E Analysis Preparation Wizard Welcome to the Analysis Preparation Wizard The Analysis Preparation Wizard helps you describe your complex sample and choose an estimation method You will be asked to provide sample weights and other information needed for accurate estimation of standard errors Your selections will be saved to a plan file that you can use in any of the analysis procedures in the Complex Samples Option What would you like to do Create a plan file Choose this option if you have sample m File data but have not created a plan file R bankloan csplan Browse Edit a plan file Choose this option if you want to add remove or modify stages of an existing plan If you already have a plan file you can skip the Analysis Preparation Vizard and go directly to any of the analysis procedures in the Complex Samples Option to analyze your sample The Analysis Preparation Wizard guides you through the steps for creating or modifying an analysis plan for use with the various Complex Samples analysis procedures Before using the Wizard you should have a sample dra
260. s from 1 indicate that some of the standard errors computed for these parameter estimates are larger than those you would obtain if you assumed that these observations came from a simple random sample while others are smaller It is vitally important to incorporate the sampling design information in your analysis because you might otherwise infer for example that the usecoup 3 coefficient is not different from 0 183 Complex Samples General Linear Model The parameter estimates are useful for quantifying the effect of each model term but the estimated marginal means tables can make it easier to interpret the model results Estimated Marginal Means Figure 19 9 Estimated marginal means by levels of Who shopping for 95 Confidence Interval Aho shopping for Mean Std Error Lower Upper 3 94286 300 0145 317 0506 4 87908 359 7955 380 8767 7 19769 443 8895 474 9888 Self 308 5326 Self and spouse 370 3361 Self and family 459 4392 This table displays the model estimated marginal means and standard errors of Amount spent at the factor levels of Who shopping for This table is useful for exploring the differences between the levels of this factor In this example a customer who shops for him or herself is expected to spend about 308 53 while a customer with a spouse is expected to spend 370 34 and a customer with dependents will spend 459 44 To see whether this represents a real difference or is due to chance var
261. s parameter is redundant You can make the following interpretations based on the parameter estimates m Those in lower age categories show greater support for the bill than those in the highest age category 202 Chapter 21 m Those who drive less frequently show greater support for the bill than those who drive more frequently m The coefficients for the variables gender and votelast in addition to not being statistically significant appear to be small compared to other coefficients The design effects indicate that some of the standard errors computed for these parameter estimates are larger than those you would obtain if you used a simple random sample while others are smaller It is vitally important to incorporate the sampling design information in your analysis because you might otherwise infer for example that the coefficient for the third level of Age category agecat 3 is significantly different from 0 Classification Figure 21 9 Categorical variable information Weighted Weighted Count Percent The legislature Strongly agree 25132 955 21 3 should enacta Agree 32261 425 27 3 gas tex Disagree 29477 417 24 9 Strongly disagree 31314 203 26 5 Age category 18 30 20509 504 17 4 31 45 35380 506 29 9 46 60 34865 792 29 5 gt 60 27430 198 23 2 Gender Male 61424 547 52 0 Female 56761 453 48 0 Voted in last No 70607 216 59 7 election Yes 47578 784 40 3 Driving Do not own car 3437 137 2 9 freque
262. se in diagnostics and reporting of results Note that none of these are available when rSave Variables Variables El Complex Samples Cox Regression E Time end Event Predictors Subgroups Model Statistics Plots Hypothesis Tests Save Export Options Item to Save Variable Name Rootname Names of Saved Variables Survival function Lower bound of confidence interval for survival function Upper bound of confidence interval for survival function Cumulative hazard function Lower bound of confidence interval for cumulative hazard function Upper bound of confidence interval for cumulative hazard function Predicted value of linear predictor Schoenteld residual one variable per model parameter Martingale residual Deviance residual Cox Snell residual Score residual one variable ner model parameter survival Automatically generate unique names Select this option if you want to add a new set of model variables to your dataset each time you do an analysis Custom names Specify names in the Variables list If you select this option any existing variables with the same name or rootname are replaced each time you do an analysis Cox eeste eset cancer e time dependent predictors are included in the model Survival function Saves the probability of survival the value of the survival function at the observed
263. seasfac sav This data file is the same as catalog sav except for the addition of a set of seasonal factors calculated from the Seasonal Decomposition procedure along with the accompanying date variables cellular sav This is a hypothetical data file that concerns a cellular phone company s efforts to reduce churn Churn propensity scores are applied to accounts ranging from 0 to 100 Accounts scoring 50 or above may be looking to change providers ceramics sav This is a hypothetical data file that concerns a manufacturer s efforts to determine whether a new premium alloy has a greater heat resistance than a standard alloy Each case represents a separate test of one of the alloys the heat at which the bearing failed is recorded cereal sav This is a hypothetical data file that concerns a poll of 880 people about their breakfast preferences also noting their age gender marital status and whether or not they have an active lifestyle based on whether they exercise at least twice a week Each case represents a separate respondent clothing_defects sav This is a hypothetical data file that concerns the quality control process at a clothing factory From each lot produced at the factory the inspectors take a sample of clothes and count the number of clothes that are unacceptable coffee sav This data file pertains to perceived images of six iced coffee brands Kennedy Riquier and Sharp 1996 For each of 23 iced coffee image attrib
264. seed value if you want to reproduce the sample later Include in the sample frame cases with user missing values of stratification or clustering variables E Working data are sorted by stratification variables presorted data may speed processing Microsoft Outlook We E Tutorial O C Documents and Se Outputi Docunn gt Select Custom value for the type of random seed to use and type 4231946 as the value gt Click Next and then click Next in the Draw Sample Output Files step 122 Chapter 13 Figure 13 30 Sampling Wizard Finish step E Sampling Wizard E Completing the Sampling Wizard You have provided all of the information needed to draw a sample You can return to the Sampling Wizard later if you need to add or modify stages After all the stages have been sampled you can use the plan file in any Complex Samples analysis procedure to indicate how the sample was drawn Welcome A Plan Summary g Draw Sample What do you want to do Selection Options Draw the sample Output Files i Paste the syntax generated by the Wizard into a syntax window i Completion To close this wizard click Finish Select Paste the syntax generated by the Wizard into a syntax window Click Finish The following syntax is generated Sampling Wizard CSSELECT PLAN FILE demo csplan CRITERIA STAGES 3 SEED 4231946 CLASSMISSING EXCLUDE DATA RE
265. ses to a survey from attendees of a political debate before and after the debate Each case corresponds to a separate respondent debate_aggregate sav This is a hypothetical data file that aggregates the responses in debate sav Each case corresponds to a cross classification of preference before and after the debate demo sav This is a hypothetical data file that concerns a purchased customer database for the purpose of mailing monthly offers Whether or not the customer responded to the offer is recorded along with various demographic information demo_cs_1 sav This is a hypothetical data file that concerns the first step of a company s efforts to compile a database of survey information Each case corresponds to a different city and the region province district and city identification are recorded demo_cs_2 sav This is a hypothetical data file that concerns the second step of a company s efforts to compile a database of survey information Each case corresponds to a different household unit from cities selected in the first step and the region province district city subdivision and unit identification are recorded The sampling information from the first two stages of the design is also included demo_cs sav This is a hypothetical data file that contains survey information collected using a complex sampling design Each case corresponds to a different household unit and various demographic and sampling information is reco
266. st cansa teo gt Select If condition is satisfied Click If 241 Complex Samples Cox Regression Figure 22 35 Select Cases If dialog box dl Hospital size hosp a Patient ID patid amp Attending physician 8 Age in years age al Age category agec amp Gender gender CDF amp Noncentral CDF amp Physically active a Current Date Time Date Arithmetic Obesity obesity History of diabetes Blood pressure k amp Bood p ep Functions and Special Variables Atrial fibrillation af amp Smoker smoker amp Cholesterol choles History of angina a Prescribed nitroglyc amp Taking anti clotting amp History of transient 8 Time to hospital time zi Initial Rankin score CAT scan result ca Type event gt 0 as the conditional expression gt Click Continue 242 Chapter 22 Figure 22 36 Select Cases dialog box Select gt MOKer SMOKEr amp Cholesterol choles ae ae amp History of angina a 9 If condition is satisfied amp Prescribed nitroglyc am event gt 0 de Taking anti clotting Random sample of cases amp History of transient Sample Time to hospital time all Initial Rankin score Based on time or case range amp CAT scan result ca Range de Clet dissolving drug Use fitter variable all Treatment result re amp Post event
267. stage inclusion probabilities gt Click Next 151 Figure 14 11 Complex Samples Analysis Preparation Wizard Analysis Preparation Wizard Plan Summary step stage 1 Stage 1 Plan Summary This panel summarizes the plan so far You can add another stage to the plan If you choose not to add a stage the next panel is the Completion panel Welcome Summary Stage1 Stage Label Strata Clusters Design Variables 4 None branch Estimation Method Size gt Summary Add Stage 2 Completion File bankloan csplan Do you want to add stage 2 Yes add stage 2 now Choose this option if the sample contains another stage gt Select Yes add stage 2 now Weights Size Method finalweight Read from inclprob_s1 Equal WOR O No do not add another stage now Choose this option if this is the last stage of the sample sea Besta ence area gt Click Next and then click Next in the Design Variables step 152 Chapter 14 Figure 14 12 Analysis Preparation Wizard Estimation Method step stage 2 Stage 2 Estimation Method In this panel you select a method for estimating standard errors The estimation method depends on assumptions about how the sample was drawn Welcome E Stage 1 Which of the following sample designs should be assumed for estimation i Design Variables Estimation Method WR sampling with replacement Size If you choose this option you will not be able to add ad
268. stage that will be used in the output Welcome Stage1 Variables Number of customers ncust gt Design Variables amp Customer ID customer Estimation Method E Age in years age gt Summary dl Level of education ed Completion E Years with current employ Years at current address Clusters Household income in thous gt Branch branch Debt to income ratio x100 8E Credit card debt in thousan gt E Other debt in thousands ot amp Previously defaulted default E inclprob_s2 inclprob_st D Sample Weight cme EE Com Cere Cree Select Branch as a cluster variable gt Select finalweight as the sample weight variable gt Click Next 149 Complex Samples Analysis Preparation Wizard Figure 14 9 Analysis Preparation Wizard Estimation Method step stage 1 Stage 1 Estimation Method In this panel you select a method for estimating standard errors The estimation method depends on assumptions about how the sample was drawn Welcome amp Stage 1 Which of the following sample designs should be assumed for estimation i Design Variables gt Estimation Method O WR sampling with replacement size If you choose this option you will not be able to add additional stages Any sample stages summary after the current stage will be ignored when the data are analyzed Stage 2 E Use finite population correction FPC when estimating variance under simple Completion r
269. ste the syntax into the syntax window Type time_to_event as the name of the variable representing the time between the two dates Type Time to second arrest as the variable label gt Click Finish Running the Analysis gt Torun a Complex Samples Cox Regression analysis from the menus choose Analyze gt Complex Samples gt Cox Regression 217 Complex Samples Cox Regression Figure 22 7 Complex Samples Plan for Cox Regression dialog box n tor Cox ression File recidivism_cs csplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default file recidivism_cs saw An open dataset recidivism_cs_sample sav DataSet3 Custom file File lrecidivism_cs_jointprob saw Browse continue _cancet_ Browse to the sample files directory and select recidivism_cs csplan as the plan file Select Custom file in the Joint Probabilities group browse to the sample files directory and select recidivism_cs_jointprob sav Click Continue 218 Chapter 22 Figure 22 8 Cox Regression dialog box Time and Event tab Tine and Event predictors Subgroups Model Statistics Plet
270. stimates Exponentiated estimate E Correlations of parameter estimates Standard error F Design effect Confidence interval Square root of design effect E t test Model Assumptions Test of proportional hazards Fi Parameter estimates for alternative model _ Covariance matrix for alternative model Baseline survival and cumulative hazard functions CoC Leste genet canna Crie gt Select Test of proportional hazards and then select Log as the time function in the Model Assumptions group gt Select Parameter estimates for alternative model Click OK Sample Design Information Figure 22 12 Sample design information AA O wi i Unweighted Counts vali Invalid Cases op Total Cases Population Subject Size 307583 898 saa o Units 20 Sampling Design Degrees of Freedom a 222 Chapter 22 This table contains information on the sample design pertinent to the estimation of the model m There is one case per subject and all 5 687 cases are used in the analysis m The sample represents less than 2 of the entire estimated population m The design requested 4 strata and 5 units per strata for a total of 20 units in the first stage of the design The sampling design degrees of freedom are estimated by 20 4 16 Tests of Model Effects Figure 22 13 Tests of model effects Source dfi df2 Wald F Sig age 1 000 16 504 787 1 580E 13 S
271. sting plan file If you want to save the plan to a new file choose to Paste the syntax generated by the Wizard into a syntax window and change the filename in the syntax commands Modifying an Existing Analysis Plan gt From the menus choose Analyze gt Complex Samples gt Prepare for Analysis gt Select Edit a plan file and choose a plan filename to which you will save the analysis plan Click Next to continue through the Wizard 27 Preparing a Complex Sample for Analysis gt Review the analysis plan in the Plan Summary step and then click Next Subsequent steps are largely the same as for a new design For more information see the Help for individual steps gt Navigate to the Finish step and specify a new name for the edited plan file or choose to overwrite the existing plan file Optionally you can remove stages from the plan Analysis Preparation Wizard Plan Summary Figure 3 8 Analysis Preparation Wizard Plan Summary step E Analysis Preparation Wizard E Plan Summary This panel summarizes the plan You can delete stages before proceeding Welcome Summary gt D Plan Summary Stage Label Strata custers vweights Size Method E Stage 1 4 None branch finalweight Read from inc Equal WOI i Design Variables Iprob_s1 Estimation Method None Read from inc Equal WOI Size Iprob_s2 Summary o Stage 2 if Design Variables Estimation Method Size Summary Add Stage 3 Com
272. stroke i dll History of hemorrhagic stro amp Third event post attack eve Fixed Variable s E Time to third event post atta start_time2 gt start_time3 Variables to be Transposed Er aa Type start_time as the target variable Select Length of stay for rehabilitation los_rehab start_time2 and start_time3 as variables to be transposed Time to first event post attack timel and Time to second event post attack time2 will be used to create the end times and each variable can only appear in one list of variables to be transposed thus start_time2 and start_time3 were necessary Select trans3 from the target variable list 233 Complex Samples Cox Regression Figure 22 27 Restructure Data Wizard Variables to Cases Select Variables step Variables to Cases Select Variables For each variable group you have in the current data the restructured file will have one target variable Inthis step choose how to identify case groups in the restructured data and choose which variables belong with each target variable Optionally you can also choose variables to copy to the new file as Fixed Variables Variables in the Current File amp Post event preventative sur Case Group Identification Length of stay for rehabilitat Total treatment and rehabilit Variable amp First event post attack eve E Time to first event post atta dl History of my
273. sults of the National Health Interview Survey NHIS and with an appropriate analysis plan for this public use data Statistics The procedure produces means and sums plus tests standard errors confidence intervals coefficients of variation unweighted counts population sizes design effects and square roots of design effects for each estimate Data Measures should be scale variables Subpopulation variables can be string or numeric but should be categorical Assumptions The cases in the data file represent a sample from a complex design that should be analyzed according to the specifications in the file selected in the Complex Samples Plan dialog box Obtaining Complex Samples Descriptives gt From the menus choose Analyze gt Complex Samples gt Descriptives Select a plan file Optionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 33 34 Chapter 6 Figure 6 1 Descriptives dialog box Variables Measures Stratum for variance e PSU for variance estim 8 Weight Final Annual Sex SEX Age ACE P Region REGION Smoking frequency S 8E Witamin mineral supplm E Take any multi vitamins Take herbal supplemen L Desirable Body Weight Daily activities moving Daily activities lifting o Subpopulations Age category age_cat Each combination of categories defines a subpopulation Cor
274. t define strata or clusters 4 sample weight variable must be selected in the first stage You can also provide a label for the stage that will be used in the output A Welcome E Stage 1 i ie Design Variables Estimation Method Summary Completion Variables amp Number of customers ncust amp Customer ID customer E Age in years age dll Level of education ed E Years with current employ Years at current address Clusters E Household income in thous amp Branch branch Debt to income ratio x100 E Credit card debt in thousan 8 Other debt in thousands ot Previously defaulted default inciprob_s1 f Sample Weight 8 inclprob_s2 gt l E finalweight 0 55 ma Geet 21 Preparing a Complex Sample for Analysis This step allows you to identify the stratification and clustering variables and define sample weights You can also provide a label for the stage Strata The cross classification of stratification variables defines distinct subpopulations or strata Your total sample represents the combination of independent samples from each stratum Clusters Cluster variables define groups of observational units or clusters Samples drawn in multiple stages select clusters in the earlier stages and then subsample units from the selected clusters When analyzing a data file obtained by sampling clusters with replacement you should include the duplication index as
275. t history This is just a beginning as researchers would undoubtedly want to include other potential predictors in the model Moreover in further analysis of this dataset you might consider more significant changes to the model structure For example the current model assumes that the effect of a patient history altering event can be quantified by a multiplier to the baseline hazard Instead it may be reasonable to assume that the shape of the baseline hazard is altered by the occurrence of a nondeath event To accomplish this you could stratify the analysis based on Event index Appendix A Sample Files The sample files installed with the product can be found in the Samples subdirectory of the installation directory There is a separate folder within the Samples subdirectory for each of the following languages English French German Italian Japanese Korean Polish Russian Simplified Chinese Spanish and Traditional Chinese Not all sample files are available in all languages If a sample file is not available in a language that language folder contains an English version of the sample file Descriptions Following are brief descriptions of the sample files used in various examples throughout the documentation accidents sav This is a hypothetical data file that concerns an insurance company that is studying age and gender risk factors for automobile accidents in a given region Each case corresponds to a cross classificati
276. table reviews your sampling plan and is useful for making sure that the plan represents your intentions Sampling Summary Figure 13 44 Stage summary Proportion of Units Number of Units Sampled Sampled County Eastern Central Western Northern Southern Plan File c poll csplan Requested Requested 4 This summary table reviews the first stage of sampling and is useful for checking that the sampling went according to plan Recall that you requested a 30 sample of townships by county the actual proportions sampled are close to 30 except in the Western and Southern counties This is because these counties each have only six townships and you also specified that a minimum of three townships should be selected per county 136 Chapter 13 Figure 13 45 Stage summary Proportion of Units Number of Units Sampled Sampled County Township Neighborhood Requested Requested 9 49 143 113 77 139 120 1 2 3 4 5 6 1 2 3 4 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 Plan File c poll csplan This summary table the top part of which is shown here reviews the second stage of sampling It is also useful for checking that the sampling went according to plan Approximately 20 of the voters were sampled from each neighborhood from each township sampled in the first stage as requested 137 Complex Samples Sampling Wizard Sample Results Figure 13 46 Data Editor with sample results
277. taste eset cance eo 83 Complex Samples Cox Regression Sample design information Displays summary information about the sample including the unweighted count and the population size Event and censoring summary Displays summary information about the number and percentage of censored cases Risk set at event times Displays number of events and number at risk for each event time in each baseline stratum Parameters This group allows you to control the display of statistics related to the model parameters m Estimate Displays estimates of the coefficients m Exponentiated estimate Displays the base of the natural logarithm raised to the power of the estimates of the coefficients While the estimate has nice properties for statistical testing the exponentiated estimate or exp B is easier to interpret m Standard error Displays the standard error for each coefficient estimate m Confidence interval Displays a confidence interval for each coefficient estimate The confidence level for the interval is set in the Options dialog box m t test Displays a test of each coefficient estimate The null hypothesis for each test is that the value of the coefficient is 0 m Covariances of parameter estimates Displays an estimate of the covariance matrix for the model coefficients Correlations of parameter estimates Displays an estimate of the correlation matrix for the model coefficients m Design eff
278. th the others and with the goal of 1 3 Related Procedures The Complex Samples Ratios procedure is a useful tool for obtaining univariate descriptive statistics of the ratio of scale measures for observations obtained via a complex sampling design m The Complex Samples Sampling Wizard is used to specify complex sampling design specifications and obtain a sample The sampling plan file created by the Sampling Wizard contains a default analysis plan and can be specified in the Plan dialog box when you are analyzing the sample obtained according to that plan m The Complex Samples Analysis Preparation Wizard is used to set analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples Descriptives procedure provides descriptive statistics for scale variables Chapter Complex Samples General Linear Model The Complex Samples General Linear Model CSGLM procedure performs linear regression analysis as well as analysis of variance and covariance for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Using Complex Samples General Linear Model to Fit a Two Factor ANOVA A grocery store chain surveyed a set of customers concerning their purchasing habits according to a complex design Given the survey results and how muc
279. the number of customers at that bank Recall the Compute Variable dialog box Type inclprob_s2 as the target variable Type 100 ncust as the numeric expression Click OK 146 Chapter 14 Figure 14 6 Compute Variable dialog box Target Variable finalweight amp Branch branch 8 Number of customers Customer ID customer 8 Age in years age al Level of education ed 8 Years with current em 8 Years at current addre E Household income in th gt Debt to income ratio x Credit card debt in tho E Other debt in thousand amp Previously defaulted d 8 inclprob_s1 8 inclprob_s2 Numeric Expression 1Xinclprob_s1 inclprob_s2 Function group All Arithmetic CDF amp Noncentral CDF Conversion Current Date Time Date Arithmetic Functions and Special Variables if ontionas case selection condition Click OK Using the Wizard Microsoft Outlook We Outputi Document Now that you have the inclusion probabilities for each stage it s easy to compute the final sampling weights Recall the Compute Variable dialog box Type finalweight as the target variable Type 1 inclprob_s1 inclprob_s2 as the numeric expression You are now ready to create the analysis plan gt To prepare a sample using the Complex Samples Analysis Preparation Wizard from the menus choose Analyze gt Complex Samples
280. the potential for multiple cases per subject since patient histories change as the occurrence of significant nondeath events are noted and the times of these events recorded The sample is also left truncated in the sense that the observed survival times are inflated by the length of rehabilitation because while the onset of risk starts at the time of the ischemic stroke only patients who survive past the rehabilitation program are in the sample Survival Time The procedure applies Cox regression to analysis of survival times that is the length of time before the occurrence of an event There are two ways to specify the survival time depending upon the start time of the interval Time 0 Commonly you will have complete information on the start of the interval for each subject and will simply have a variable containing end times or create a single variable with end times from Date amp Time variables see below Varies by subject This is appropriate when you have left truncation also called delayed entry for example if you are analyzing survival times for patients exiting a rehabilitation program post stroke you might consider that their onset of risk starts at the time of the stroke However if your sample only includes patients who have survived the rehabilitation program then your sample is left truncated in the sense that the observed survival times are inflated by the length of rehabilitation You can account for t
281. the strata cluster subpopulation and factor variables 63 Complex Samples Logistic Regression Confidence Interval This is the confidence interval level for coefficient estimates exponentiated coefficient estimates and odds ratios Specify a value greater than or equal to 50 and less than 100 CSLOGISTIC Command Additional Features The command syntax language also allows you to Specify custom tests of effects versus a linear combination of effects or a value using the CUSTOM subcommand Fix values of other model variables when computing odds ratios for factors and covariates using the ODDSRATIOS subcommand Specify a tolerance value for checking singularity using the CRITERIA subcommand Create user specified names for saved variables using the SAVE subcommand Produce a general estimable function table using the PRINT subcommand See the Command Syntax Reference for complete syntax information Chapter Complex Samples Ordinal Regression The Complex Samples Ordinal Regression procedure performs regression analysis on a binary or ordinal dependent variable for samples drawn by complex sampling methods Optionally you can request analyses for a subpopulation Example Representatives considering a bill before the legislature are interested in whether there 1s public support for the bill and how support for the bill is related to voter demographics Pollsters design and conduct interviews according to a com
282. tion Hazard function Plot Factors at Factor History of myocardial infarction Display confidence intervals in selected plots History of ischemic stroke History of hemorrhagic stroke Level Highest level 1 0 Log minus log of survival function One minus survival function arate Lines Ei Plot Covariates at 0 0 By default covariates in the model are evaluated at their means and factors in the model are evaluated at their highest levels You can change the value at which any model predictor is evaluated and plot separate lines for each level of one factor variable Select Log minus log of survival function Check Separate Lines for History of myocardial infarction Select 1 0 as the level for History of ischemic stroke Select 0 0 as the level for History of hemorrhagic stroke Click the Options tab 254 Chapter 22 gt gt Figure 22 48 Cox Regression dialog box Options tab Tine and Evert Predictors Subgroups Model Statistics Plots Hypothesis Tests Save Export Options Estimation Survival Functions Method for estimating baseline survival functions O Efron method Maximum Step Halving b AR 4 roduct limit metho Maximum Iterations 100 Limit iterations based on change in parameter estimates Confidence intervals of survival functions Compute based on transformed survival Minimum Change 0 000001 Type Relative function then b
283. tionally select a custom joint probabilities file gt Click Continue Copyright SPSS Inc 1989 2010 29 30 Chapter 5 Figure 5 1 Frequencies dialog box amples Plan for Frequencies Ana Variables Frequency Tables Stratum for varianc 8E PSU for variance e E Weight Final Annu L Sex SEX Age AGE_P Region REGION E Smoking frequency E Take any multi vita Take herbal supple Subpopulations E Freq vigorous activi E Age category age_cat Freq moderate activ Se Freq strength activit pesrabe Dedy Wels Each combination of 8 Daily activities movi categories defines a E Daily activities liftin subpopulation Coc ease Beset cancer rie J Select at least one frequency variable Optionally you can specify variables to define subpopulations Statistics are computed separately for each subpopulation Complex Samples Frequencies Statistics Figure 5 2 Frequencies Statistics dialog box E Complex Samples Frequencies Statistics X Cells Fi Population size Table percent Statistics Fi Standard error Unweighted count Confidence interval Design effect Level 95 Square root of design effect Coefficient of variation Cumulative values Test of equal cell proportions Gi cee te Cells This group allows you to request esti
284. tions are set to the system missing value all parameter estimates are set at zero and all standard errors significance levels and residual degrees of freedom are set to the system missing value Note This file is not immediately usable for further analyses in other procedures that read a matrix file unless those procedures accept all the row types exported here Export Model as XML Saves the parameter estimates and the parameter covariance matrix if selected in XML PMML format You can use this model file to apply the model information to other data files for scoring purposes 62 Chapter 10 Complex Samples Logistic Regression Options Figure 10 8 Logistic Regression Options dialog box rEstimation User Missing Values Maximum Iterations 100 Maximum Step Halving This setting applies to categorical design and Fi Limit iterations based on change in parameter estimates 7 Limit iterations based on change in log likelihood Fi Check for complete separation of data points E Display iteration history E Complex Samples Logistic Regression Options O Treat as invalid Treat as valid model variables te Confidence Interval Starting Iteration Ge eee tee Estimation This group gives you control of various criteria used in the model estimation Maximum Iterations The maximum number of iterations the algorithm will execute Specify a non negative int
285. tom file File gt Browse to and select property_assess csplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Click Continue 173 Complex Samples Ratios Figure 18 2 Ratios dialog box Variables statistics amp Property ID propic Numerators Neighborhood nbrhood Current value currval Missing Values amp Township town Le options Years since last appra Inclusion Selection Pr Cumulative Sampling VV Denominator Cumulative Sampling VV 3 E Value at last epnraisel fl Subpopulations Each combination of categories defines a subpopulation Select Current value as a numerator variable Select Value at last appraisal as the denominator variable Select County as a subpopulation variable Click Statistics Figure 18 3 Ratios Statistics dialog box M Standard error IM Unweighted count Confidence interval IM Population size Level Design effect E Coefficient of variation E Square root of design effect Y ttest Test value Select Confidence interval Unweighted count and Population size in the Statistics group Select t test and enter 1 3 as the test value Click Continue Click OK in the Complex Samples Ratios dialog box 174 Chapter 18 Ratios Figure 18 4 Ratios table 95 Confidence Ratio Standard Interval County Numerator Denominator Estim
286. tor for predictors that are not part of interaction terms For example Exp B for employ is equal to 0 798 which means that the odds of default for people who have been with their current employer for two years are 0 798 times the odds of default for those who have been with their current employer for one year all other things being equal The design effects indicate that some of the standard errors computed for these parameter estimates are larger than those you would obtain if you assumed that these observations came from a simple random sample while others are smaller It is vitally important to incorporate the sampling design information in your analysis because you might otherwise infer for example that the age coefficient is no different from 0 193 Complex Samples Logistic Regression Odds Ratios Figure 20 10 Odds ratios for level of education 95 Confidence Previously Interval Odds Ratio defaulted Level of Did not complete high Yes education school vs Post undergraduate 2 054 4 259 degree High school degree vs 1 983 4 397 Some college vs 1 679 3 244 College degree vs 2202 1152 4 208 Dependent Variable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt 4 Factors and covariates used in the computation are fixed at the following values Level of education Post undergraduate degree Age in years 34 19 Years with current empl
287. tors and covariates as appropriate for your data m Specify a variable to define a subpopulation The analysis is performed only for the selected category of the subpopulation variable although variances are still properly estimated based on the entire dataset m Select a link function Link function The link function is a transformation of the cumulative probabilities that allows estimation of the model Five link functions are available summarized in the following table Function Form Typical application Logit log 8 1 8 Evenly distributed categories Complementary log log log log 1 amp Higher categories more probable Negative log log log log Lower categories more probable Probit o l Latent variable is normally distributed Cauchit inverse Cauchy tan r 8 0 5 Latent variable has many extreme values 66 Chapter 11 Complex Samples Ordinal Regression Response Probabilities Figure 11 2 Ordinal Regression Response Probabilities dialog box E Complex Samples Ordinal Regression Response Probabilities X Cumulative Response Probabilities Accumulate from lowest value of dependent variable to highest value Accumulate from highest value of dependent variable to lowest value cnet Loe The Response Probabilities dialog box allows you to specify whether the cumulative probability of a response that is the probability of belonging up to and incl
288. trademarks of IBM Corporation registered in many jurisdictions worldwide A current list of IBM trademarks is available on the Web at http www ibm com legal copytrade shmtl SPSS is a trademark of SPSS Inc an IBM Company registered in many jurisdictions worldwide Adobe the Adobe logo PostScript and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and or other countries Intel Intel logo Intel Inside Intel Inside logo Intel Centrino Intel Centrino logo Celeron Intel Xeon Intel SpeedStep Itanium and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries Linux is a registered trademark of Linus Torvalds in the United States other countries or both Microsoft Windows Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States other countries or both UNIX is a registered trademark of The Open Group in the United States and other countries Java and all Java based trademarks and logos are trademarks of Sun Microsystems Inc in the United States other countries or both This product uses WinWrap Basic Copyright 1993 2007 Polar Engineering and Consulting http www winwrap com Other product and service names might be trademarks of IBM SPSS or other companies Adobe product screenshot s reprinted with permission from Adobe Systems Incor
289. trata gt Click Next and then click Next in the Output Variables step 131 Figure 13 39 Sampling Wizard Plan Summary step stage 2 Stage 2 Plan Summary Welcome _ Stage 1 i Design Variables Method Sample Size Output Variables h Summary J Stage 2 i Design Variables Method Sample Size Output Variables gt Summary Add Stage 3 Draw Sample i Selection Options Output Files Completion Summary Stage Label 1 None county town 2 a nbrhood File c tempipoll csplan Do you want to add stage 3 O Yes add stage 3 now Choose this option if the working data file contains data for stage 3 gt Look over the sampling design and then click Next Strata Clusters Complex Samples Sampling Wizard This panel summarizes the sampling plan so far You can add another stage to the design If you choose not to add a next stage the next step is to set options for drawing your sample Size Method 0 3 per stratum PPSWOR 0 2 per stratum Simple Random Sampling WOR No do not add another stage now Choose this option if stage 3 data are not available yet or your design has only two Stages 132 Chapter 13 Figure 13 40 Sampling Wizard Draw Sample Selection Options step Tn Draw Sample Selection Options In this panel you can choose whether to draw a sample You can pick which stages to extract and set other sampling options such as the seed used for random number generation
290. ty with which a customer defaults is related to age employment history and amount of credit debt incorporating the sampling design Running the Analysis gt To create the logistic regression model from the menus choose Analyze gt Complex Samples gt Logistic Regression Copyright SPSS Inc 1989 2010 186 187 Complex Samples Logistic Regression Figure 20 1 Complex Samples Plan dialog box File pankloan csaplan Browse If you do not have a plan file for your complex sample you can use the Analysis Preparation Wizard to create one Choose Prepare for Analysis from the Complex Samples menu to access the wizard Joint Probabilities Joint probabilities are required if the plan requests unequal probability WOR estimation Otherwise they are ignored O Use default fle bankloan saw O an open dataset bankloan_cs sav DataSet5 O Custom file File Browse Browse to and select bankloan csaplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 gt Click Continue 188 Chapter 20 Figure 20 2 Logistic Regression dialog box Variables Dependent Variable model Branch branch 8 Number of customers amp Customer ID customer Reference Category Factors dl Level of education ed Covariates Age in years age Select Previously defaulted as the dependent variable Select Level of education a
291. u need to add or modify stages After all the stages have been sampled you can use the plan file in any Complex Samples analysis procedure to indicate how the sample was drawn Welcome Stage 1 Design Variables A Method Save the design to a plan file and draw the sample Sample Size Output Variables i Summary FJ _ Stage 2 O Paste the syntax generated by the Wizard into a syntax window Design Variables Method Sample Size Output Variables Summary Add Stage 3 _ Draw Sample Selection Options Output Files gt Completion To close this wizard click Finish gt Click Finish These selections produce the sampling plan file property_assess csplan and draw a sample according to that plan 103 Complex Samples Sampling Wizard Plan Summary Figure 13 10 Plan summary PO ages stage 2 Design Stratification 1 Neighborho Variables County od Cluster 1 Township Sample f Selection Method Simple Simple Information random random sampling sampling without without replacement replacement Number of Units Sampled 4 Wariables Stagewise Inclusion Inclusion Inclusion Created or Selection Probability Probability _1 Probability _ Modified ll We Stagewise Cumulative Sample Sample Sample Weight Weight Weight Cumulative_ Cumulative _ 1_ 2 Proportion of Units Sampled 2 aliit Estimator Assumption Equal Equal probability probability sampling sampling without without replacement replacement Inclus
292. uding a particular category of the dependent variable increases with increasing or decreasing values of the dependent variable Complex Samples Ordinal Regression Model Figure 11 3 Ordinal Regression Model dialog box Complex Samples Ordinal Regression Model rSpecify Model Effects Main effects 9 Custom Factors amp Covariates Model pul agecat agecat Jul drivefreg drivefreq jul gender gender H yotelast Build Term s votelast Type Nested Term Term Interaction Nesting Add to Model Clear EC Specify Model Effects By default the procedure builds a main effects model using the factors and covariates specified in the main dialog box Alternatively you can build a custom model that includes interaction effects and nested terms 67 Complex Samples Ordinal Regression Non Nested Terms For the selected factors and covariates Interaction Creates the highest level interaction term for all selected variables Main effects Creates a main effects term for each variable selected All 2 way Creates all possible two way interactions of the selected variables All 3 way Creates all possible three way interactions of the selected variables All 4 way Creates all possible four way interactions of the selected variables All 5 way Creates all possible five way interactions of the selected variables Nested Terms You can build nested terms fo
293. ue is the difference between the number of primary sampling units and the number of strata in the first stage of sampling Alternatively you can set a custom degrees of freedom by specifying a positive integer Adjustment for Multiple Comparisons When performing hypothesis tests with multiple contrasts the overall significance level can be adjusted from the significance levels for the included contrasts This group allows you to choose the adjustment method m Least significant difference This method does not control the overall probability of rejecting the hypotheses that some linear contrasts are different from the null hypothesis values m Sequential Sidak This is a sequentially step down rejective Sidak procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level 86 Chapter 12 Save Figure 12 10 Sequential Bonferroni This is a sequentially step down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level Sidak This method provides tighter bounds than the Bonferroni approach Bonferroni This method adjusts the observed significance level for the fact that multiple contrasts are being tested Cox Regression dialog box Save tab Save Variables This group allows you to save model related variables to the active dataset for further u
294. ue corresponding to the indicated day Dile of release fro month and year The arguments must resolve to ma integers with day between 1 and 31 month amp Posted bail bail between 1 and 13 and year a four digit integer Re greater than 1582 To display the result as a date assign a date format to the result variable Complex Samples Cox Regression Function group Date Arithmetic Date Extraction Inverse DF Miscellaneous Missing Values Functions and Special Variables Date May Date Moyr Date Qyr Date VWkyr Date Yrday gt Type date2 as the target variable gt Type DATE DMY 30 6 2006 as the expression gt Click If 212 Chapter 22 Figure 22 2 Compute Variable If Cases dialog box amp Region region O Include all cases e Province province 9 Include if case satisfies condition amp District district MSSINO dete2 amp Arrest ID arrest i yd Age in years age Function group all Age category agec e bs 2 g 8 3 amp Marital status marital all Social status social lt 4 A E mi Level of education amp Employed employ e ad 2 a Sender gender uy la ud mm all Severity of first cri Violent first crime v Date of release fro amp Posted bail bail MISSING variable Logical Returns 1 or true if variable amp Received rehabilitati has a system or user missing value The argument amp Sec
295. urvival Time Variable Time to second arrest Event Status Variable Second arrest 1 0 Model age In the proportional hazards model the significance value for the predictor age is less than 0 05 and therefore appears to contribute to the model Test of Proportional Hazards Figure 22 14 Overall test of proportional hazards df df2 Wald F Sig 1 000 16 000 29 924 5 136E 5 Survival Time Variable Time to second arrest Event Status Variable Second arrest 1 0 Model age age _TF Figure 22 15 Parameter estimates for alternative model 90 Confidence Interval Parameter Std Error ayer on Te 00 Survival Time Variable Time to second arrest Event Status Variable Second arrest 1 0 Model age age _TF a Time function Log The significance value for the overall test of proportional hazards is less than 0 05 indicating that the proportional hazards assumption is violated The log time function is used for the alternative model so it will be easy to replicate this time dependent predictor Adding a Time Dependent Predictor gt Recall the Complex Samples Cox Regression dialog box and click the Predictors tab 223 Click New Figure 22 16 gt Complex Samples Cox Regression Cox Regression Define Time Dependent Predictor dialog box Name Variables Time T amp Arrest ID arrest E Age in years age dl Age category agec Marital status marital mi Social status social mi
296. us if A is a factor then specifying A A is invalid m All factors within a nested effect must be unique Thus if 4 is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Intercept The intercept is usually included in the model If you can assume the data pass through the origin you can exclude the intercept Even if you include the intercept in the model you can choose to suppress statistics related to it Complex Samples General Linear Model Statistics Figure 9 3 General Linear Model Statistics dialog box E Complex Samples General Linear Model Statistics Model Parameters F Estimate F Covariances of parameter estimates E Standard error F Correlations of parameter estimates IM Confidence interval M Design effect E t test E Square root of design effect Model Fit 2 Population means of dependent variable and covariates Fi Sample design information S Model Parameters This group allows you to control the display of statistics related to the model parameters m Estimate Displays estimates of the coefficients m Standard error Displays the standard error for each coefficient estimate Confidence interval Displays a confidence interval for each coefficient estimate The confidence level for the interval is set in the Options dialog box m Ttest Displays a test of each coe
297. ustomer effect can be said to be nested within the Store location effect Additionally you can include interaction effects such as polynomial terms involving the same covariate or add multiple levels of nesting to the nested term Limitations Nested terms have the following restrictions m All factors within an interaction must be unique Thus if A is a factor then specifying A 4 is invalid m All factors within a nested effect must be unique Thus if 4 is a factor then specifying A A is invalid m No effect can be nested within a covariate Thus if A is a factor and X is a covariate then specifying A X is invalid Statistics Figure 12 7 Cox Regression dialog box Statistics tab El Complex Samples Cox Regression E Time and Event Predictors Subgroups Model Statistics Plots Hypothesis Tests Save Export Options Fi Sample design information N Event and censoring summary Risk set at event times Parameters Estimate Covariances of parameter estimates Exponentiated estimate Correlations of parameter estimates Standard error Design effect 7 Confidence interval Square root of design effect t test rModel Assumptions M Test of proportional hazards Time Function Fi Parameter estimates for alternative model E Covariance matrix for alternative model Baseline survival and cumulative hazard functions CE
298. utes people selected all brands that were described by the attribute The six brands are denoted AA BB CC DD EE and FF to preserve confidentiality contacts sav This is a hypothetical data file that concerns the contact lists for a group of corporate computer sales representatives Each contact is categorized by the department of the company in which they work and their company ranks Also recorded are the amount of the last sale made the time since the last sale and the size of the contact s company creditpromo sav This is a hypothetical data file that concerns a department store s efforts to evaluate the effectiveness of a recent credit card promotion To this end 500 cardholders were randomly selected Half received an ad promoting a reduced interest rate on purchases made over the next three months Half received a standard seasonal ad customer_dbase sav This is a hypothetical data file that concerns a company s efforts to use the information in its data warehouse to make special offers to customers who are most likely to reply A subset of the customer base was selected at random and given the special offers and their responses were recorded customer_information sav A hypothetical data file containing customer mailing information such as name and address customer_subset sav A subset of 80 cases from customer_dbase sav 261 Sample Files debate sav This is a hypothetical data file that concerns paired respon
299. wn according to a complex design Creating a new plan is most useful when you do not have access to the sampling plan file used to draw the sample recall that the sampling plan contains a default analysis plan If you do have access to the sampling plan file used to draw the sample you can use the default analysis plan contained in the sampling plan file or override the default analysis specifications and save your changes to a new file Copyright SPSS Inc 1989 2010 19 20 Chapter 3 Creating a New Analysis Plan gt From the menus choose Analyze gt Complex Samples gt Prepare for Analysis Select Create a plan file and choose a plan filename to which you will save the analysis plan gt Click Next to continue through the Wizard gt Specify the variable containing sample weights in the Design Variables step optionally defining strata and clusters gt You can now click Finish to save the plan Optionally in further steps you can m Select the method for estimating standard errors in the Estimation Method step m Specify the number of units sampled or the inclusion probability per unit in the Size step m Add a second or third stage to the design E Paste your selections as command syntax Analysis Preparation Wizard Design Variables Figure 3 2 Analysis Preparation Wizard Design Variables step E Analysis Preparation Wizard Stage 1 Design Yariables In this panel you can select variables tha
300. would you like to do Design a sample Choose this option if you have not created a plan E Browse file You will have the option to draw the sample i les ed Piti O Edit a sample design Choose this option if you want to add remove or modify stages of an existing plan You will have the option to draw the sample O Draw the sample Choose this option if you already have a plan file and want to draw a sample The Sampling Wizard guides you through the steps for creating modifying or executing a sampling plan file Before using the Wizard you should have a well defined target population a list of sampling units and an appropriate sample design in mind Creating a New Sample Plan gt From the menus choose Analyze gt Complex Samples gt Select a Sample gt Select Design a sample and choose a plan filename to save the sample plan O Copyright SPSS Inc 1989 2010 4 5 Sampling from a Complex Design Click Next to continue through the Wizard Optionally in the Design Variables step you can define strata clusters and input sample weights After you define these click Next Optionally in the Sampling Method step you can choose a method for selecting items If you select PPS Brewer or PPS Murthy you can click Finish to draw the sample Otherwise click Next and then In the Sample Size step specify the number or proportion of units to sample You can now click Finish to draw the sample
301. y No Model Intercept ed age employ address income debtinc creddebt othdebt The classification table shows the practical results of using the logistic regression model For each case the predicted response is Yes if that case s model predicted logit is greater than 0 Cases are weighted by finalweight so that the classification table reports the expected model performance in the population m Cells on the diagonal are correct predictions m Cells off the diagonal are incorrect predictions Based upon the cases used to create the model you can expect to correctly classify 85 5 of the nondefaulters in the population using this model Likewise you can expect to correctly classify 60 9 of the defaulters Overall you can expect to classify 76 5 of the cases correctly however because this table was constructed with the cases used to create the model these estimates are likely to be overly optimistic Tests of Model Effects Figure 20 8 Tests of between subjects effects Source df Corrected Model 11 000 Intercept 1 000 ed 4 000 age 1 000 employ 1 000 address 1 000 income 1 000 debtinc 1 000 creddebt 1 000 othdebt 1 000 Dependent Variable Previously defaulted reference category No Model Intercept ed age employ address income debtinc creddebt othdebt 192 Chapter 20 Each term in the model plus the model as a whole is tested for whether its effect equals 0 Terms with si
302. you are analyzing the sample obtained according to that plan m The Complex Samples Analysis Preparation Wizard is used to set analysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples Crosstabs procedure provides descriptive statistics for the crosstabulation of categorical variables m The Complex Samples Descriptives procedure provides univariate descriptive statistics for scale variables Chapter Complex Samples Descriptives The Complex Samples Descriptives procedure displays univariate summary statistics for several variables Optionally you can request statistics by subgroups defined by one or more categorical variables Using Complex Samples Descriptives to Analyze Activity Levels A researcher wants to study the activity levels of U S citizens using the results of the National Health Interview Survey NHIS and a previously created analysis plan For more information see the topic Using the Complex Samples Analysis Preparation Wizard to Ready NHIS Public Data in Chapter 14 on p 140 A subset of the 2000 survey is collected in nhis2000_subset sav The analysis plan is stored in nhis2000_subset csaplan For more information see the topic Sample Files in Appendix A in JBM SPSS Complex Samples 19 Use Complex Samples Descriptives to produce univariate
303. your estimates units within strata should be as homogeneous as possible for the characteristics of interest Clusters Cluster variables define groups of observational units or clusters Clusters are useful when directly sampling observational units from the population is expensive or impossible instead you can sample clusters from the population and then sample observational units from the selected clusters However the use of clusters can introduce correlations among sampling units resulting in a loss of precision To minimize this effect units within clusters should be as heterogeneous as possible for the characteristics of interest You must define at least one cluster variable in order to plan a multistage design Clusters are also necessary in the use of several different sampling methods For more information see the topic Sampling Wizard Sampling Method on p 8 7 Sampling from a Complex Design Input Sample Weight If the current sample design is part of a larger sample design you may have sample weights from a previous stage of the larger design You can specify a numeric variable containing these weights in the first stage of the current design Sample weights are computed automatically for subsequent stages of the current design Stage Label You can specify an optional string label for each stage This is used in the output to help identify stagewise information Note The source variable list has the same content across
304. ysis specifications for an existing complex sample The analysis plan file created by the Sampling Wizard can be specified in the Plan dialog box when you are analyzing the sample corresponding to that plan m The Complex Samples General Linear Model procedure allows you to model a scale response m The Complex Samples Logistic Regression procedure allows you to model a categorical response Chapter Complex Samples Cox Regression The Complex Samples Cox Regression procedure performs survival analysis for samples drawn by complex sampling methods Using a Time Dependent Predictor in Complex Samples Cox Regression A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction One of the measures of recidivism is the time until second arrest for offenders The agency would like to model time to rearrest using Cox Regression on a sample drawn by complex sampling methods but they are worried the proportional hazards assumption is invalid across age categories Persons released from their first arrest during the month of June 2003 were selected from sampled departments and their case history inspected through the end of June 2006 The sample is collected in recidivism_cs_sample sav The sampling plan used is contained in recidivism_cs csplan because 1t makes use of a probability proportional to size PPS method there is also a file containing the joint selection probabilities recidivism_cs_jointpro

IBM Water System SPSS COMPLEX SAMPLES 19 User's Manual

Contents

Download Pdf Manuals

Related Search

Related Contents