Home

Read Manual - Statistical Solutions

1. MEE Arial am a ezu azs a SPECIFICATION Date of Analysis Wednesday November 24 1999 at 11 18 20 Dataset Ue Debug Report MMI_TRIAL od Analysis Multiple Imputation Propensity Method File Edt View Format Window Help Non monotone Imputation variable s and selected covariates Courier New am plz il le EEE Imputing non monotone local pattern 1 contains 1 variable s and occurs in Ei Variables to Covariate Covariate Information Forced case s impute MeasA 1 SYMPDUR Complete Yes Wariable s in local pattern AGE Complete Yes MeasB 1 MeasA 0 Complete Yes MeasB_0 Complete Yes Case s with local pattern MeasA 2 Variable to impute Yes 1 MeasA_3 Variable to impute Yes MeasB_1 Variable to impute Yes Imputing variable MeasB_1 in local pattern 1 MeasB 3 Variable to impute Yes MeasA 2 SYMPDUR Complete Yes Variables in regression pool AGE Complete Yes MeasA_1 MeasA_2 MeasA_3 MeasB_3 SYMPDUR AGE MeasA_O MeasA 0 Complete Yes MeasB 0 Complete Yes case s containing observed or previously imputed values in these variable s MeasA 1 Variable to impute Yes ts a 4 5 6 8 10 12 13 15 16 17 18 MeasA_3 Variable to impute Yes ise 20 Ai a ae aa Bey aT vey 780 96 35 36 MeasB_1 Variable to impute Yes 37 38 39 41 42 43 Siy 45 46 47 48 49 50 MeasB 3 Variable to impute Yes MeasA 3 SYMPDUR Complete Yes AGE Complete Yes z Page 1 zj Ki Fac aR OL
2. Variable s OBS Vars_to_Impute Covariate s Forced la Click on the sign in SYMPDUR front of a variable name AGE Meast_1 to expand contract it Meas 0 MeasA_2 To add additional Meast_ covariates to a variable s Moas 2 Measd_3 regression pool drag the Meas 3 T est covariate into the list of a covariates column Moa MeasB_2 beside the variable MeasB_3 To add a covariate to _ MeasB_3 all of the regression pools drag a variable name onto the title of the Covariate s column To toggle all of the Drag Variable 71x selections in the Forced ype column click on the column title pa AN ET Missing be F seo Again you select the or signs to expand or contract the list of covariates for each imputation variable The list of covariates for each imputation variable will be made up of the variables specified as Fixed Covariates in the Base Setup tab and all of the other imputation variables Variables can be added and removed from this list by simply dragging and dropping the variable from the list of covariates to the variables field or vice versa Even though a variable appears in the list of covariates for a particular imputation variable it might not be used in the final model The program first sorts the variables so that the missing data pattern is as close as possible to monotone and then uses only the variables that are to the left of the imputation varia
3. Rubin D B 1980 Handling Non response in Sample Surveys by Multiple Imputations Monograph U S Department of Commerce Bureau of the Census Rubin D B 1981 The Bayesian Bootstrap The Annals of Statistics 9 130 134 Rubin D B 1983 Progress Report on Project For Multiple Imputation of 1980 Codes manuscript distributed to the U S Bureau of the Census the U S National Science Foundation and the Social Science Research Foundation Rubin D B 1984 Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician The Annals of Statistics 12 1151 1172 Rubin D B 1988 Using the SIR Algorithm to Simulate Posterior Distributions with discussion in Bayesian Statistics 3 eds J M Bernard M H DeGroot D V Lindley and A F M Smith New York Oxford University Press pp 395 402 Rubin D B 1990 Imputation Procedures and Inferential Versus Evaluative Statistical Statements in Proceedings U S Census Bureau Sixth Annual Research Conference pp 676 679 Rubin D B and Schenker N 1991 Analyzing Multiple Imputed Data sets Rubin D B 1993 Satisfying Confidentiality Constraints Through the Use of Synthetic Multiple Imputed Micro Data Journal of Official Statistics 9 461 468 Rubin D B 1994 Comments on Missing Data Imputation and the Bootstrap by B Efron Journal of the American Statistical Association 89 485 8 Rubin D B and Schenker N 1986 Multiple Imputa
4. Appendices Let c be the number of incompletely observed covariates and i i c be the index number of these covariates Let X be the adjusted data matrix constructed as follows 1 The first column of X consists of 1 s 2 The j 1 th column of X with 1 lt j lt c consists of 1 s and 0 s such that the v th entry of this column equals 0 when the v th data entry of xi is observed and is equal to 1 when this entry is missing 3 For the c 1 j th column of X the i th entry is equal to the i th entry of x when this entry is observed and is equal to 0 when this entry is missing Let Yos and Ymis be the observed and missing data for y respectively Let X and Xmis be the rows of X corresponding to Y and Ymis respectively Each imputation Ymis for Ymis is independently generated according to the same algorithm described in Completely Observed Covariates above Imputation User Manual 55 Appendix D Discriminant Multiple Imputation DISCRIMINANT MULTIPLE IMPUTATION Discriminant Multiple Imputation This appendix describes the method used to impute binary and categorical variables for Discriminant Multiple Imputation Discriminant Multiple Imputation is a model based method for binary or categorical variables The detailed imputation method is described in the following Let 1 s be the categories of the categorical imputation variable y By applying Bayes Theorem the statistical model of discriminant
5. 1959 Testing Statistical Hypotheses New York John Wiley Li K H Meng X L Raghunathan T E and Rubin D B 1991 Significance Levels from Repeated p Values With Multiple Imputed Data Statistica Sinica 1 65 92 Li K H Raghunathan T E and Rubin D B 1991 Large Sample Significance Levels from Multiply Imputed Data Using Moment Based Statistics and an F Reference Distribution Journal of the American Statistical Association 86 1065 1073 Little R J A 1979 Maximum Likelihood for Multiple Regression With Missing Values A Simulation Study Journal of the Royal Statistical Society B41 76 87 Little R J A 1988 Missing Data in Large Surveys also with discussion Journal of Business and Economic Statistics 6 287 301 Little R J A and Rubin D B 1987 Statistical Analysis with Missing Data New York John Wiley Little R J A and Rubin D B 1993 Assessment of Trial Imputations for NHANES III project report Datametrics Research Inc Liu C and Rubin D B 1996 M Multiple Imputation System report Datametrics Research Inc Liu J S and Chen R 1995 Blind De convolution via Sequential Imputations Journal of the American Statistical Association 90 567 576 Meng X 1994 Multiple Imputation with Uncongenial Sources of Input with discussions Statistical Science 9 538 574 Meng X L and Rubin D B 1992 Performing Likelihood Ratio Tests with Multiply Imputed Data sets Bio
6. 2 1 2 observed values of y after y 60 Imputation User Manual Appendices are the observed values of y with an assigned propensity score closest to and higher than the propensity score assigned to the missing data entry If less than e 2 observed values have an assigned propensity score smaller than the assigned propensity score then only these values are used as the observed values of y in the imputation Similarly if less than c 2 1 2 observed values of y have an assigned propensity score larger than the assigned propensity score then only these values are used as the observed values of y in the imputation Use d Closest Matching Cases The same as for c Closest Matching Cases where c is equal to Ka 100 nons and where Nops 1S equal to the number of observed values of y There must be at least two values in each sub group Imputation User Manual 61 Appendix F References MULTIPLE IMPUTATION REFERENCES SOLAS References u ey Rubin D B 1987 Multiple Imputation for Nonresponse in Surveys New York John Wiley 2 Gelman A Carlin J Stern H and Rubin D B 1995 Bayesian Data Analysis New York Chapman and Hall 3 Rubin D B and Schenker N 1991 Multiple Imputation in Health Care Data Bases An Overview and Some Applications Statistics in Medicine 10 585 598 4 Lavori P Dawson R and Shera D 1995 A Multiple Imputation Strategy for Clinical Tria
7. Missing Data Pattern Options Gidse Pixels fio Pairwise Reports Counts C Percentages Proportions 24 Imputation User Manual Imputation The third view of your data set is displayed from the View menu of the Output pages after you have performed the imputation see Multiple Imputation Output later in this manual You can also use the View menu Legend option to display a colored legend that identifies the method of imputation used for the missing data Monotone Missing Data Pattern A monotone missing data pattern occurs when the variables can be ordered from left to right such that a variable to the left is at least as observed as all variables to the right For example if variable A is fully observed and variable B is sometimes missing A and B form a monotone pattern Or if A is only missing if B is also missing A and B form a monotone pattern If A is sometimes missing when B is observed and B is sometimes missing when A is observed then the pattern is not monotone e g see Little and Rubin 1987 Section 6 4 and References 6 and 7 in Appendix F We also distinguish between a missing data pattern and a local missing data pattern A missing data pattern refers to the entire data set such as a Monotone missing data pattern A local missing data pattern for a case refers to the missingness for a particular case of a data set A local missing data pattern for a variable refers to the miss
8. adding and removing 34 37 38 completely observed 53 expand contract list 33 fixed 31 34 37 46 forced 46 incompletely observed 54 list of 38 missing values in 21 possible 47 with missing values 21 Data Pages 41 Datasheet 10 Define Case Selection 11 Deselect a variable 11 Discriminant Analysis 20 Model 14 Multiple Imputation 20 28 56 Donor Pool 28 29 39 defining sub classes 29 Estimated Parameters 48 Frequency table 10 F to Enter 36 40 F to Remove 36 40 Generation of Imputations 27 Glossary 46 Group Means 13 15 Example 15 Grouping variable 15 missing values in 22 Hardware Requirements 1 Imputation 7 46 Hot deck 13 16 46 Example 17 in SOLAS Overviews 13 Last Value Carried Forward 13 19 Mean 13 predicted mean 20 propensity score 59 random overall 16 Report 20 subset 16 variable 47 Imputed values 16 Imputing monotone missing data 31 non monotone missing data 31 Initialize From Variable Name 19 Installation Instructions 1 administrative 5 network 5 Intent to treat 46 analyses and missing data 8 Last value carried forward 13 18 19 46 Example 18 Least Squares Regression 14 20 27 36 40 License Agreement 2 Linear interpolation 13 30 Logistic regression model 28 40 Longitudinal variable 13 18 19 47 Last Value Carried Forward 13 Main Seed Value 36 40 Imputation User Manual 67 Index Maximum Likelihood Criteria 41 Mean imputat
9. t 2 The regression model will allow us to model the missingness using the observed data Using the regression coefficients we calculate the propensity that a subject would have a missing value in the variable in question In other words the propensity score is the conditional probability of missingness given the vector of observed covariates Each missing data entry of the imputation variable y is imputed by values randomly drawn from a subset of observed values of y i e its donor pool with an assigned probability close to the missing data entry that is to be imputed The Donor Pool defines a set of cases with observed values for that imputation variable Defining Donor Pools Based on Propensity Scores Using the options in the Donor Pool window the cases of the data sets can be partitioned into c donor pools of respondents according to the assigned propensity scores where c 5 is the default value of c This is done by sorting the cases of the data sets according to their assigned propensity scores in ascending order 28 Imputation User Manual Imputation The Donor Pool page gives the user more control over the random draw step in the analysis You are able to set the sub set ranges and refine these ranges further using another variable known as the Refinement Variable that is described below Three ways of defining the Donor Pool sub classes are provided 1 You can divide the sample into c equal sized subsets
10. 1 42 55 56 68 73 and 108 with matching values for SPECIES and SEPALLEN and a randomly selected respondent is used to impute the missing value Imputation User Manual 17 Imputation NOTE If your sort variables are continuous variables with significant decimal places exact matches may not occur You could use the Transform feature to take the integer value of variables that you want to use for sorting This imputed data set can be saved for later analysis or exported to any other statistics package See Chapter 1 Data Management in the Systems Manual Last Value Carried Forward The Last Value Carried Forward LV CF technique can be used when the data are longitudinal i e repeated measures have been taken per subject The last observed value is used to fill in missing values at a later point in the study Therefore one makes the assumption that the response remains constant at the last observed value This assumption can be biased if the timing and rate of withdrawal is related to the treatment For example in the case of degenerative diseases using the last observed value to impute for missing data at a later point in the study means that a high observation will be carried forward resulting in an overestimation of the true end of study measurement LVCF Example This example uses the data set MI_TRIAL MDD located in the SAMPLES subdirectory Define Longitudinal Variables Since LVCF can only be performed o
11. 1 1 x1 UE ren 3 x 4 x x n 27 l The index i refers to the i th missing values of y k is the number of covariates used for imputation variable y X is the determinant of and x is the row vector of observed values for the covariates of y corresponding to the i th missing value of y vi Let y equal to j with probability P for i 1 nmi and for j 1 5 This is realized by drawing u from the standard uniform distribution and setting y equal to j j l ES pys v l vii Impute y for the i th missing data entry of y for i 1 Nimis J Piy 1 p In steps i to iii the probabilities T are drawn from a Diriclet distribution which is the posterior distribution of these probabilities with non informative prior as described in chapter 4 of Development Implementation and Evaluation of Multiple Imputation Strategies for the Statistical Analysis of Incomplete Data sets Brand J P L In step iv the means 4 are randomly drawn from its normal posterior distribution The estimated covariance matrices are used in step iv instead of the covariance matrices drawn from a posterior distribution Drawing the covariance matrices from their inverted Wishart posterior distribution is relatively expensive computationally Imputation User Manual 57 Appendices In predicted mean single imputation for each missing data entry the category with the largest conditional probability given
12. 1 Variable_6 2 Variable_2 2 Variable_2 3 Variable_4 3 Variable_4 4 Variable_3 4 Variable_3 5 Vatiable_1 5 6 6 Variable Variable_5 Variable_5 The same process is continued for variables 4 3 and 2 as shown in the left hand image below All cases with the same local missing data pattern are adjacent Finally an additional scan is performed to determine whether any of the variables that lie outside the Monotone pattern can be moved in order to include more missing values in the Monotone pattern In this example swapping the first two variables results in extra missing values being included in the Monotone pattern The result of this process is shown in the right hand image below Variables 6 Variables 6 8 Bi Variable No Variable Name 8 El Variable No Variable Name E 1 Variable_6 EI 1 Variable_2 2 Variable_2 2 Variable_6 3 Vatiable_4 3 Variable_4 4 Vatiable_3 4 Variable_3 5 Variable_1 5 Variable_1 6 Vatiable_5 6 Variable_5 The right hand image above displays the final result in constructing an approximate Monotone pattern for the example datasheet shown earlier The missing values in the lower right corner are labeled as Monotone missing and the others as Non monotone missing 26 Imputation User Manual Imputation Predictive Model Based Method If Predictive Model Based Multiple Imputation is selected then an ordinary least
13. 4 NUM Cok O Line 476 Page 1 Co O Line 2972 Page 1 The Imputation Report and the Output Log shown in part above summarize the results of the logistic regression the ordinary regression and the settings used for the multiple imputation Imputation Report The imputation report contains a summary of the parameters that were chosen for the Multiple Imputation For example the seed value that was used for the random selection the number of imputations that were performed etc are all reported The report shows An overview of the Multiple Imputation parameters In the specification section there are tables of the variables and selected covariates for non monotone and monotone patterns number of imputed pages random seed etc Diagnostic information that can be used to judge the quality and validity of the generated imputations The options chosen for the least squares and logistic regression options as well as sub classing of propensity scores 42 Imputation User Manual Imputation The diagnostic section also gives a detailed breakdown of the number of cases available initially and numbers excluded for various reasons Further conclusions about the statistical analysis can be drawn from the combined results see Analyzing Multiply Imputed Data Sets later in this manual Output Log The Output Log provides details of the regressions carried out for all the imputed values on the imputed data s
14. Combined Statistics for Imputed Data sets Pressing the Combined tab in a data page displays the statistics computed using the results of the M analyses Each statistic is first combined across the M results Each displayed statistic is then followed by a series of diagnostics useful in assessing the effect of the missing data on the statistical result For example if the mean is computed in a Descriptive Statistics output the associated combined statistics for the mean include The average of the M computed means its total variance T and its total standard error T The Diagnostics include The between imputation variance Bm the between imputation standard error sqrt B the relative increase in variance due to missing data r sqrt rm and the fraction of information missing due to missing data y The statistics that are combined for each analysis are listed below Descriptive Statistics Mean C I for mean Standard deviation Standard error of mean Variance Coefficient of variation Skewness Kurtosis Median Quartiles Interquartile range Proportion Serial Correction t and Non parametric Tests Descriptive Statistics Means Standard deviations Standard errors of the means Confidence intervals for the means Two group Pooled Variance t test including t value df and p values Imputation User Manual 51 Appendices Paired Matched t test including t value df and p va
15. Data Pages you can select either Imputation Report Output Log Imputed Data Pattern or Missing Data Pattern When other analyses are performed from the Analyze menu of a data page see the example Analyzing Multiply Imputed Data Sets later in this manual a Combined tab is added to the data page tabs Selecting this tab displays the combined statistics for these data pages The combined statistics that are displayed are given in Appendix B Combined Statistics Data Pages The Multiple Imputation output displays five data pages with the imputed values shown in a color that contrasts with the observed values These five pages of completed data results are displayed and allow the user to examine how the combined results are calculated The first data page Page 1 for the above example is shown below Imputation User Manual 41 Imputation E Multiple Imputation Data Pages MMI_TRIAL Fie Edit Variables Use Analyze Plot Format View Window Hel Imputed Data Pattem Missing Data Pattem Legend 12896 40401 53504 38209 77841 36169 H a TPT AN ostsset 1 A Dataset 2 A Dataset 3 A Dataset 4 A Datasets 7 From the View menu you can select Imputation Report and Output Log examples of both are shown below or Imputed Data Pattern and Missing Data Pattern F Imputation Report MMI_TRIAL File Edt View Format Window Help
16. Main window View menu System Preferences menu Using the data fields in this window you can create a datasheet or a frequency table with the required number of variables and cases or rows and columns Start in one of the following ways e Enter the criteria for your new datasheet then press the OK button Or e Select an existing datasheet from your file system using the Open window then press the Open button Whether you create a new datasheet or open an existing datasheet you will see a window similar to the window shown below with its menu bar atasheet MI_TRIAL File Edit Variables Use Analyze Plot Format View Window Help Selecting Analyze then Single Imputation or Multiple Imputation displays one of the menus shown below Vapatiex Use nos Da fome Yew Binde biep OE Single lngetation ace EON From one of the menus shown above you can select the method of imputation that you want to use and a specification window will be displayed where the selected method can be setup 10 Imputation User Manual Imputation General The following subsections provide general information about variables grouping variable selection de selection and defining case selection Grouping variables can be selected for all of the imputation methods If a grouping variable is specified then the sorting of missing data patterns and the generation of multiple imputations is carried ou
17. complete data sets let m m 1 M be M complete data estimates for a parameter and Um m 1 M be their associated variances Combined Estimate of Parameter The combined estimate of any multi dimensional parameter of interest for a particular variable is simply the mean of the estimates from each of the M imputed data sets For example the combined estimate of the mean for a specific group or a particular regression coefficient in a model is simply the mean of the estimates for that parameter across the M computed data sets 3 Sif ett a The general formula for combining point estimates Q gt Q M iz In some cases point estimates are combined in a slightly different way Standard deviation Serial correlation and Pearson r Where m number of imputations and Q corresponds to the point estimate calculated from the i datasheet Z m 1 Pooled correlation PAra E aye and z 0 5 In Q exp 2 Z 1 mM a 1 Q A Q Correlation for the i imputed data set 48 Imputation User Reference Manual Appendices ba ar l E The pooled Standard Deviation gt Q where m number of imputations and Q M iz corresponds to the variance calculated from the i datasheet Standard Errors and Confidence Intervals To estimate the variance of the combined parameter estimate we combine the corresponding variance that is estimated from the combined parameter estimates f
18. imputation is given by the following equation lur Jz P y jlx glx Es zz ih Stags In this equation P y j x is the gt ole UET v l probability that the imputation variable y is equal to its j th category given the vector x of the observed values of the covariates of y and l 4 2 is the density of the multivariate normal distribution with mean u and covariance matrix X l and X are the conditional mean and covariance matrix of the covariates of y given that y is equal to its j th category and Tj is the apriori probability that y is equal to its j th category The imputation scheme for discriminant multiple imputation is given by i Let n be the number of observed values of y equal to the j th category of y and let aj 1 2 n for bel eran a ii Draw 6 Scan 6 from the standard Gamma distribution with parameters given by dis As gt 56 Imputation User Manual Appendices iii Let r 0 j 1 5 v 1 iv For J 1 5 draw u j from the multivariate normal distribution with mean and covariance matrix given by Li and j n j Where Li and are the sample mean and covariance matrix of the covariates of y calculated from the cases where y is observed and equal to its j th category ox luse ij 3 Yolx lus v l ri l and tor fH Ls v Let p The function is the probability density function of the multivariate normal distribution given by
19. set the second set of the M imputed values is used to form the second imputed data set and so on In this way M imputed data sets are obtained Each of the M imputed data sets is statistically analyzed by the complete data method of choice This yields M intermediate results These M intermediate results are then combined into a final result from which the conclusions are drawn according to explicit formulae see Appendix A The extra inferential uncertainty due to missing data can be assessed by examining the between imputation variance and the following related measures The relative increases in variance due to non response Rn and the fraction of information missing due to missing data y General Before the imputations are actually generated the missing data pattern is sorted as close as possible to a Monotone missing data pattern and each missing data entry is labeled as either Monotone missing or Non monotone missing according to where it fits in the sorted missing data pattern Missing Data Pattern The Missing Data Pattern window displays missing data patterns from your data set before and after imputation You can display the Specify Missing Data Pattern window shown below from the View menu of a datasheet Using this window you specify which variables should be used to determine a missing data pattern You can also specify a grouping variable in which case separate patterns will be generated for each group Imp
20. squares regression method of imputation is applied to the continuous integer and ordinal imputation variables and discriminant multiple imputation is applied to the nominal imputation variables Ordinary Least squares Regression The predictive information in a user specified set of covariates is used to impute the missing values in the variables to be imputed First the Predictive Model is estimated from the observed data Using this estimated model new linear regression parameters are randomly drawn from their Bayesian posterior distribution The randomly drawn values are used to generate the imputations which include random deviations from the mode s predictions Drawing the exact model from its posterior distribution ensures that the extra uncertainty about the unknown true model is reflected In the system multiple regression estimates of parameters are obtained using the method of least squares If you have declared a variable to be nominal then you need design variables or dummy variables to use this variable as a predictor variable in a multiple linear regression The system s multiple regression allows for this possibility and will create design variables for you Generation of Imputations Let Y be the variable to be imputed and let X be the set of covariates Let Y be the observed values in Y and Ynis the missing values in Y Let X be the units corresponding to Yops The analysis is performed in two steps 1 T
21. the Refinement Variable column When you use a refinement variable the program reduces the subset of cases included in the donor pool to include only cases that are close with respect to their values of the refinement variable You can also specify the number of refinement variable cases to be used in the donor pool For this example we will use all of the default settings in this tab Advanced Options Selecting the Advanced Options tab displays the Advanced Options window that allows the user to control the settings for the imputation and the logistic regression Imputation User Manual 39 Imputation Base Setup Non Monotone Monotone Donor Pool Advanced Options Randomization Output Main Seed Value a I Output Log f Least Squares Regression Options Stepping Criteria r Tolerance poo o con100 peros N Model Tolerance a Fin Peneve 0 1500 p Logistic Regression Options ee eS Maximum Likelihood Criteria Model Tolerance 0 000700 f Maximum iterations to Fj El zi convergence zi Tail area probabilities to control entry akektiood function z of removal of terms from the model convergence criterion 0 000010 r Enty Removal sf Parameter estimates 01000 O O convergenes entenon BF Randomization Main Seed Value The Main Seed Value is used to perform the random selection within the propensity subsets The default seed is 12345 If you set this field to blank or set it to ze
22. the default will be 5 If the value of c results in not more than 1 case being available to the selection algorithm c will decrement by until such time as there is sufficient data The final value of c used is included in the Imputation Report output described later in this manual 2 Youcan use the subset of c cases that are closest with respect to propensity score This option allows you to specify the number of cases before and after the case being imputed that are to be included in the sub class About 50 of the cases will be used before and 50 of the cases are used after The default c will be 10 and cannot be set to a value less than 2 If less than 2 cases are available a value of 5 will be used for c 3 You can use the subset of d of the cases that are closest with respect to propensity score This option allows you to specify the number of cases before and after the case being imputed This is the percentage of closest cases in the data set to be included in the sub class The default for d will be 10 00 and cannot be set to a value that will result in less than 2 cases being available If less than 2 cases are available a d value of 5 will be used Refer to Appendix E Propensity Score Multiple Imputation for more detailed information Refinement variable Using the Donor Pool window a refinement variable w can be chosen and can be applied to each of the three Donor Pool options described above For each missing val
23. the middle two Group Means Example This example uses the Fisher 1936 Iris data FISHER MDD containing measurements in centimeters of sepal length and width as well as petal length and width on 50 samples from each of three species of Iris 1 Setosa 2 Versicolor 3 Virginica The file FISHMISS MDD is a copy of the original file created after deleting six values In this example we will use Group Means imputation to replace the missing values in the data set 1 Open the file FISHMISS MDD located in the SAMPLES subdirectory 2 To perform Group Means Imputation from the datasheet menu bar select Analyze Single Imputation then choose Group Means Multiple selection of variables using drag and drop is supported and is described earlier in this manual 3 Select the variable s you want to impute SEPALWID and PETALLEN by dragging and dropping the variable s from the Variables list to the Variables to Impute field 4 Drag and drop your grouping variable from the Variable list to the Grouping Variable field If you have chosen a grouping variable that has not been previously categorized the system warns you that you must group the variable If you do not specify a grouping variable the overall mean for the variable will be imputed For this example the variables we want to impute are SEPALWID and PETALLEN so drag and drop them from the Variables list to the Variables to Impute field Our grouping variable is the v
24. was not done because the test tube was dropped or lost The assay was not done because the patient died or was lost to follow up or other possible causes Getting Started After performing the Setup described earlier in this manual clicking on the SOLAS 3 0 icon displays the Main window shown below w SOLAS 02 00 File View Window Help o solas Select File and then Open from the Main window menu bar displays an Open window In this window you can browse the directories folders on your system for a list of the stored data sets which you want to analyze TE E A el JAIRPOLL FILE2 JE ML_TRIAL AIRPOLL2 FILES CARS Filex FATNESS FISHER FIDELL FISHMISS FILE1 LONGLEY PEPE EE PEE om Files of type Solas Datasheet mdd Cancel T Open as read only Help The datasheets in the Samples folder shown in the Open window above can be used as data to perform some example analyses which will familiarize you with the system Several of these examples are discussed later in this manual Alternatively you may want to create a new datasheet in this case you would select New from the File menu in the Main window Imputation User Manual 9 Imputation Type of file Solas Datasheet mdd Solas Frequency T able mdf Number Vars fi Number Cases fi Name for file Fiet Cancel Help You can also set preferences for your output options from the
25. with i the index number of this case and x a row vector with its first element equal to 1 and the other element containing the observed values of the selected covariates of the i th case is assigned iii The cases in the data set are sorted according to their propensity score in ascending order Gv For each missing data entry of y a subset of observed values of y its donor pool is found such that their assigned propensity scores that are close to the assigned propensity score of the missings to be imputed This subset of observed values can be defined in different ways depending on the selected option Possible options are Divide propensity score into c quantile subsets Use c closest matching cases Use d closest matching cases Use arefinement variable Imputation User Manual 59 Appendices These options are described later in this Appendix v For each missing value of y the imputations are generated from its donor pool according to the Approximate Bayesian Bootstrap Method The estimated probability that a value of y is missing from the logistic regression model is a Monotone non increasing function of the propensity score given by Heme 14 exp propensity score 1 exp propensity score This implies that if instead of assigning the propensity scores to the cases the estimated probabilities that y is missing are assigned to the cases The resulting imputation method is equivalent to the one de
26. you work and the product serial number Company Statistical Solutions Ltd Serial 5 The User information form is displayed Enter your name company and the serial number which is on the Solas 3 0 box for your copy of the system After entering the information click on the Next button to continue If you enter an incorrect serial number you will be prompted to enter the correct number NOTE Itis recommended that you record the serial number for your copy of Solas 3 0 in a safe place Choose Destination Location x Setup will install Solas 2 0 in the following folder To install to this folder click Next To install to a different folder click Browse and select another ler os can choose not to install Solas 2 0 by clicking Cancel to exit etup C Statistical Solutions Solas 2 0 Destination Folder lt Back Next gt Cancel 6 The Destination Location screen provides a default directory C Program Files Statistical Solutions Solas 3 0 Click on the Browse button to choose a different folder After specifying a folder click on the Next button to continue Imputation User Reference Manual 3 Installation Select Program Folder E Setup will add program icons to the Program Folder listed below You may type a new folder name or select one from the existing Folders list Click Next to continue Program Folders Existing Folders Accessori
27. Covariates field 3 As there is no Grouping variable in this data set we can leave this field blank Non Monotone Selecting the Non monotone tab allows you to add or remove covariates from the logistic model used for imputing the non monotone missing values in the data set These can be identified in the Missing Data Pattern mentioned earlier in the Predictive Model example You select the or signs to expand or contract the list of covariates for each imputation variable Specify Propensity Method Multiple Imputation Ei lonotone Donor Pool Advanced Options Vars_to_Impute Covariate s Forced al Click on the sign in front of a variable name Meas T to expand contract it MeasA_2 To add additional covariates to a variable s MeasA_3 regression pool drag the Meat T covariate into the list of ime covariates column MeasB_2 beside the variable To add a covariate to ees all of the regression pools drag a variable name onto the title of the Covariate s column To toggle all of the 75r Lx selections in the Forced column click on the Sonn column title ous time periods DreaWenade Type Missing The list of covariates for each imputation variable will be made up of the variables specified as Fixed Covariates in the Base Setup tab and all of the other imputation variables Variables can be added and removed from
28. Installing SOLAS 3 0 HARDWARE REQUIREMENTS SOFTWARE REQUIREMENTS INSTALLATION INSTRUCTIONS NETWORK INSTALLATION UNINSTALLING KNOWN ISSUES The following paragraphs give the configuration recommended for optimization of the performance of SOLAS 3 0 Hardware Requirements PC with a Pentium processor CD ROM drive 14 MB of free disk space 32 MB of RAM VGA graphics card with compatible monitor gt gt Oe o A mouse Software Requirements e Microsoft Windows 9x or Microsoft Windows NT 4 0 Intel with Service Pack 3 or later NOTE If you are installing to Windows NT 4 0 then you must have administrative rights to that machine Installation Instructions 1 Insert the SOLAS 3 0 CD into the CD drive of your computer Click on Run in the Start menu and enter x setup where x is the drive for the CD 2 You will see a message saying that the SOLAS 3 0 Setup program is preparing to install Imputation User Reference Manual 1 Installation Welcome x Welcome to the Solas Setup program This program will install Solas on your computer It is strongly recommended that you exit all Windows programs before running this Setup program Click Cancel to quit Setup and then close any programs you have running Click Next to continue with the Setup program WARNING This program is protected by copyright law and intemational treaties Unauthorized reproduction or distribution of thi
29. LAS 3 0 CD Setup exe must be run from a workstation with write access to the server not directly on the server itself 2 When you get to the Destination Location screen choose a network drive letter The server volume to which you choose to install SOLAS 3 0 must be mapped to a drive letter e g X you cannot specify a UNC path e g server share 3 Setup will copy all the SOLAS 3 0 installation and program files to the administrative installation point 4 After the installation is complete share the folder to which you installed SOLAS 3 0 on the network Client Installation To install SOLAS 3 0 on the client computers users will 1 Connect to the main SOLAS 3 0 folder on the administrative installation point 2 Run Setup exe This setup will install the SOLAS 3 0 sample files and certain shared DLLs necessary to run SOLAS 3 0 Then the user will access the SOLAS 3 0 program files from the network It is recommended that users have a read only connection to the server SOLAS 3 0 Imputation User Reference Manual 5 Installation Uninstalling OR Click on the uninstall icon in the Solas 3 0 program group Go to Add Remove Programs in the Control Panel click on Solas 3 0 and then click on the Add Remove button Known Issues If SOLAS 3 0 fails to install or fails to run after the installation the following is a list of known issues that may be the source of the problem 1
30. To do a single user installation of SOLAS 3 0 you must have a CD ROM drive on your machine You cannot perform a shared CD installation of SOLAS 3 0 i e you cannot access another CD ROM drive over the network to install SOLAS 3 0 Also you cannot copy the contents of the SOLAS 3 0 CD from another computer to your hard disk or to a network volume and then attempt to run the installation from either of these locations You must run the SOLAS 3 0 installation directly from the target machine s CD ROM drive to its hard disk If you are installing to Windows NT 4 0 then you must have administrative rights to that machine If you do not have administrative rights to the machine that you are installing to then the installation may not be able to install all of the files necessary to run SOLAS 3 0 Anti virus programs that have been configured to prevent executable files from being copied to the hard disk will interfere with an installation If the installation appeared to run normally but did not transfer the application to the hard drive virus protection may be the problem Therefore the solution to the problem is to disable your anti virus software and then run the installation again 6 Imputation User Reference Manual SOLAS Version 3 0 MISSING DATA SINGLE IMPUTATION MULTIPLE IMPUTATION ANALYZING MULTIPLY IMPUTED DATA SETS APPENDICES About this Manual This manual deals with the problem of analyzing data sets in wh
31. alue in the imputation variable the program works out which variables from the total list of covariates can be used for prediction By default all of the covariates are forced into the model If you uncheck a covariate it will not be forced into the model but will be retained as a possible covariate in the stepwise selection Details of the models that were actually used to impute the missing values are included in the Output Log that can be selected from the View menu of the Multiply Imputed Data Pages These data pages will be displayed after you have specified the imputation and pressed the OK button in the Specify Predictive Model window Monotone Selecting the Monotone tab allows you to add or remove covariates from the predictive model used for imputing the monotone missing values in the data set These can be identified in the Missing Data Pattern mentioned earlier 34 Imputation User Manual Imputation Specify Predictive Model Based Method Multiple Imputation Ei Advanced Options Base Setup Non Monotone Variable s Vars_to_Impute Covariate s Forced ja Click on the sian in front of a variable name gt Measi_1 to expand contract it MeasA_2 To add additional covariates to a variable s Measi_ regression pool drag the z Meas8 1 covariate into the list of m covariates column gt MeasB_2 beside the variable To add a covariate to os Moe all ofthe regr
32. and Grouping Variable datafield 3 For this example we have chosen the Variables to be imputed as MeasA_1 and MeasA_ 2 the variable MeasA_0 as the Covariate NOTE You cannot drag variables that do not contain missing values into the Variable s to Impute listbox 4 When the required variables have been selected press the OK button to display the Specify Predicted Mean window shown below Specify Predicted Mean Single Imputation Variable s Variable s to Impute Covariate s MeasA_1 MeasA_O MeasA_2 Grouping Variable 5 After pressing the OK button new datasheet window is displayed where the imputed values are displayed as green text and an Imputation Report shown in part below can be selected from the View menu 20 Imputation User Manual Imputation if Single Imputation Predicted Mean Imputation Report RMI_TRIALmodified Fie Edt View Fomat Window Help Aiai ae Afe zuje EE Single Imputation Linear Regression Imputation Report RMI_TRIA BEE Ele Edt View Format Window Help Ail am a ezu az 4 DIAGNOSTICS I SPECIFICATION Imputation Variable Me 1 Date of Analysis Wednesday November 24 1999 at 16 25 52 ai Dataset MI_TRIAL modified Cases included in imputation model Analysis Single Predicted Mean Imputation Total included Included with Included with imputation imputation Imputation variable s and selected co
33. andomly selected from within the imputation class If no matching respondent is found the respondent is selected at random from all the used cases 2 Ifthe Include a missingness indicator is chosen for a covariate x then the independent variable x is changed into R x and the intercept is adjusted by adding the independent variable 1 R to the regression model where R is the response indicator vector for the incomplete covariate x See Appendix C Multiple Imputation Predictive Model Based Method 3 Ifthe Exclude option is chosen all of those cases that are missing in the Covariate are excluded and no missing values will be imputed for these cases NOTE Unless another Covariate is chosen the Covariate with missing values discussed above will be used in all subsequent steps of the imputation And 4 Ifanominal variable s is chosen as the Covariate s you will be prompted to create design variables and these will be used in the regression analysis Imputation User Manual 21 Imputation 5 Tf there are no groups in the variable chosen as a grouping variable you will be prompted to group the variable NOTE There are no missing values in the variable chosen as a grouping variable for this example but if there were the following window would be displayed Incompletely Observed Grouping Variable The grouping variable you have chosen has missing values You may Use hat deck imputation to impute the groupin
34. ariable SPECIES so this should be dragged to the Grouping Variable field Imputation User Manual 15 Imputation Specify Group Means Single Imputation Variables Variables to Impute SEPALWID PETALLEN Variables Grouping Variable SPECIES 5 When you are satisfied with your choice click OK The imputed data set is displayed with the imputed values appearing in pink This imputed data set can be saved for later analysis or exported to various other statistics packages see Chapter 1 Data Management in the Systems Manual Hot Decking This method sorts respondents and non respondents into a number of imputation subsets according to a user specified set of covariates An imputation subset comprises cases with the same values as those of the user specified covariates Missing values are then replaced with values taken from matching respondents i e respondents that are similar with respect to the covariates If there is more than one matching respondent for any particular non respondent the user has two choices 1 The first respondent s value as counted from the missing entry downwards within the imputation subset is used to impute The reason for this is that the first respondent s value may be closer in time to the case that has the missing value For example if cases are entered according to the order in which they occur there may possibly be some type of time effect in some studies 2 Arespondent s valu
35. ata in the same sub set of cases The user can specify or add covariates for use in the Predictive Model for any variables that will be imputed More information about using covariates is given in the example below Imputing the Monotone Missing Data The Monotone missing data are sequentially imputed for each set of imputation variables with the same local pattern of missing data First the leftmost set is imputed using the observed values of this set and its selected fixed covariates only Then the next set is imputed using the observed values of this set the observed and previous imputed values of the first set and the selected fixed covariates This continues until the Monotone missing data of the last set is imputed For each set the observed values of this set the observed and imputed values of the previously imputed sets and the fixed covariates are used If multivariate propensity score multiple imputation is selected for the imputation of the Monotone missing data then this method is applied for each subset of sets having the same local missing data pattern Imputation User Manual 31 Imputation Short Examples These short examples use the data set MI_TRIAL MDD located in the SAMPLES subdirectory This data set contains the following 11 variables measured for 50 patients in a clinical trial OBS Observation number SYMPDUR Duration of symptoms AGE The patient s age MeasA_0 MeasA_1 MeasA_2 and Mea
36. ble as covariates Details of the models that were actually used to impute the missing values are included in the Output Log 38 Imputation User Manual Imputation Donor Pool Selecting the Donor pool tab displays the Donor Pool page that allows more control over the random draw step in the analysis by allowing the user to define Propensity Score sub classes Base Setup Non Monotone Monotone dvanced Options Propensity Score Divide propensity score into 5 subsets C Use J10 closest cases C Use i000 Z of the dataset closest cases r Refinement Variable Variables Variable s _to_Impute Refinement_Variable N Meast_1 MeasA_2 Meast_3 MeasB_1 MeasB_2 ot Specify the number of refinement variable cases to be used in the selection pool The following options for defining the Propensity Score sub classes are provided Divide propensity score into c subsets The default is 5 Use c closest cases This option allows you to specify the number of cases before and after the case being imputed that are to be included in the subset Use d of the data set closest cases This option allows you to specify the number of cases as a percentage NOTE See Defining Donor Pools Based on Propensity Scores earlier in this manual You can use one Refinement Variable for each of the variables being imputed Variables can be dragged from the Variables listbox to
37. by any complete statistical analysis The extra uncertainty due to missing data is taken into account by imputing two or more different values per missing data entry Predictive Model Based Method The models that are available at present are an Ordinary Least Squares OLS Regression and a Discriminant Model When the data are continuous or ordinal the OLS method is applied When the data are categorical the discriminant method is applied Multiple imputations are generated using a regression model of the imputation variable on a set of user specified covariates The imputations are generated via randomly drawn regression model parameters from the Bayesian posterior distribution based on the cases for which the imputation variable is observed Each imputed value is the predicted value from these randomly drawn model parameters plus a randomly drawn error term The randomly drawn error term is added to the imputations to prevent over smoothing of the imputed data The regression model parameters are drawn from a Bayesian posterior distribution in order to reflect the extra uncertainty due to the fact that the regression parameters can be estimated but not determined from the observed data Propensity Score Method The system applies an implicit model approach based on Propensity Scores and an Approximate Bayesian Bootstrap to generate the imputations The propensity score is the estimated probability that a particular element of data is
38. d A missing value in a longitudinal variable which has at least one observed value before and at least one observed value after the period for which it is missing A variable which is selected as covariate for all selected variables to be imputed Except for discriminant imputation this variable is an independent variable in the corresponding regression model A variable which is selected as covariate for all selected variables to be imputed A covariate that has been forced into a regression model i e will not be removed from the model during stepping A method of imputation in which missing values are replaced with values taken from matching respondents i e respondents that are similar with respect to variables observed for both A procedure whereby missing values in a data set are filled in with plausible estimates to produce a complete data set which can then be analyzed using complete data inferential methods Intent to treat IT analysis dictates that all cases both complete and incomplete be included in any analyses and treatment effects should be measured with subjects assigned to the treatment to which they were randomized rather than to the treatment actually received A method of imputation for replacing missing values in longitudinal studies using the last observed value 46 Imputation User Reference Manual Longitudinal variable Mean imputation Multiple imputation Possible Covariate Pro
39. dow and the unsorted Variable List window are shown below Imputation User Manual 25 Imputation Datasheet MonctonePattern il EA Variables 6 Ele Edt Variables Lise Anabze Pot Famal View Window Help Je Jou fos Jus furs ko Variable List Cases fj a n 8 Variable No Variable Name 1 Variable_1 Vatiable_2 Variable_3 Variable_4 Variable_5 Variable_6 Dawn The new pattern after sorting can be viewed in the Missing Pattern window and in the Variable list window Missing cases are represented by the darkened squares All variables with the same local missing data pattern are adjacent After sorting the variables from most observed to least observed in the first process we have the following result Variables 6 Yo Variable List Cases 1 Variable No Variable Name 1 Variable_6 Variable_2 Variable_4 Variable_3 Variable_1 Variable_5 onan The second process rearranges the cases Starting with the least missing variable 6 cases with the most missing values are moved towards the bottom of the sort order and this process is repeated for the next least missing variable 5 as shown in the left and right hand images below Variables 6 Variables 6 cases plelole Cs a 8 Variable No Variable Name 8 El Variable No Variable Name 1 Variable_6
40. e is randomly selected from within the imputation subset If a matching respondent does not exist in the initial imputation class the subset will be collapsed by one level starting with the last variable that was selected as a sort variable or until a match can be found Note that if no matching respondent is found even after all of the sort variables have been collapsed three options are available Re specify new sort variables The user can specify up to five sort variables Perform random overall imputation Where the missing value will be replaced with a value randomly selected from the observed values in that variable Do not impute the missing value SOLAS will not impute any missing values for which no matching respondent is found 16 Imputation User Manual Imputation Hot Decking Example This example also uses the data set FISHMISS MDD 1 Open the file FISHMISS MDD 2 To perform Hot deck Imputation from the datasheet menu bar select Analyze Single Imputation and Hot Deck 3 Again the variables we want to impute are SEPALWID and PETALLEN so drag them into the Variables to Impute field For this example we will use SPECIES and SEPALLEN as our sort variables The order in which the Variables for Sort are specified is important because if no matching respondent is found in the initial imputation class the class will be collapsed by one level according to the last variable specified in the Variables for Sor
41. es Adobe Acrobat 4 0 aICLEAR 4 0 ARCserve Client Agent for Windows 95 DemoShield 5 4 EquivTest 1 0 Demonstration EquivTest 1 00 HP Laserjet x lt Back Cancel 7 Inthe Select Program Folder screen the default Program Folder name is Solas 3 0 You may type a new Program Folder name or select one from the existing Program Folder list Click on the Next button to continue Check Solas 2 0 Information E Setup has enough information to begin the file transfer operation If you want to review or change any of the settings click Back If you are satisfied with the settings click Next to begin copying files Statistical Solutions Ltd abe123 Source Directory D 5 Target Directory C Solas 2 0 Solas 2 0 will take up 12MB of diskspace Program Folder Solas 2 0 ou are installing from a CDRom drive to a hard drive 8 Use the Check SOLAS 3 0 Information screen to confirm that the information is correct If you are satisfied click Next to continue 9 When the installation is complete you are given the choice of reading the README file or not Click Yes to read the file click No to continue without reading it Setup Complete Setup has completed installing Solas 2 0 Before you can use the Program you must restart your computer No will restart my computer later Remove any disks from their drives and then click Finish to exit setup 10 The final screen tells y
42. ession H H H pools drag a variable r r name onto the title of the Covariate s column To toggle all of the Drag Wenable i i 1X selections in the Forced Type column click on the column title Missing istime periods Again you select the or signs to expand or contract the list of covariates for each imputation variable For each imputation variable the list of covariates will be made up of the variables specified as Fixed Covariates in the Base Setup tab and all of the other imputation variables Variables can be added and removed from this list by simply dragging and dropping Even though a variable appears in the list of covariates for a particular imputation variable it might not be used in the final model The program first sorts the variables so that the missing data pattern is as close as possible to monotone and then uses only the variables that are to the left of the imputation variable as covariates Details of the models that were actually used to impute the missing values are included in the Output Log Advanced Options Selecting the Advanced Options tab displays a window that allows you to choose control settings for the regression discriminant model Specify Predictive Model Based Method Multiple Imputation Ei Base Setup Non Monotone Monotone Advanced Options Randomization Output it i 23 i Main Seed Value IZ Output Log Least Squares Regression To
43. ets Information is given for the variables and cases involved in local patterns and the variables and cases involved in the regressions used For propensity method propensity scores are given For predictive model the equations used to estimate and generate the imputed values along with their error terms are given Imputed Data Pattern and Missing Data Pattern windows The Missing Data Pattern window can be selected from the View menu of the your datasheet before the imputation is performed You can also display a colored legend from the View menu that identifies missing data and data that is present in the data set The Imputed Data Pattern window can be selected from the View menu of the your datasheet after the imputation is performed You can also display a colored legend from the View menu that identifies Monotone and Non monotone patterns Examples of this window are given below P Missing Data Pattern Page1_MMI_TRIAL File Use View Beun Window Help Variables 21 F Cases J 6 3 y H Analyzing Multiply Imputed Data Sets Example This section presents a simple example of analyzing multiply imputed data sets It will show how the results of the repeated imputations can be combined to create one repeated imputation inference For reference see Appendix A Analyzing Multiply Imputed Data Sets and Appendix F 1 and 2 After you have performed a Multiple Imputation on your data set you will ha
44. g POU Variable 1 E Meas _0 MeasA_1 aul sqMeasA_ lye Variable 2 easB MeasB_1 MeasA_2 MeasB2 l Type Null hypothesis Role mean diff 0 00 K Cancel Clear Help 2 Drag and drop the variables MeasA_1 and MeasA_2 to the Variable 1 and Variable 2 datafields respectively 3 Press the OK button to display the data pages then press the Combine tab to display the combined statistics from the five imputed data pages as shown below 44 Imputation User Manual Imputation ERZEN BEE File View Options Format Window Help or Ifo a euE SPECIFICATIONS Date Wednesday November 24 1999 at 16 43 23 Data Set MMI_TRIALmodified Imputed Datasets 5 Analysis Combined paired comparison of mean Variables MeasA_1 vs MeasA_2 Null Hypothesis Difference of Mean 0 0000 COMBINED DESCRIPTIVE STATS Mean Standard Standard Error Deviation MeasA_1 250 2840 60 7864 8 6204 MeasA 2 244 0847 71 5055 10 1547 MeasA_1 MeasA 2 6 1993 26 1225 3 7851 Dataset 3 Dataset 4 A Dataset cok 0 Line 57 Page 1 The statistics that are calculated for each analysis selected from the Analyze menu and displayed in the Combined page are given in Appendix B Combined Statistics Imputation User Manual 45 Glossary DEFINITIONS Bounded missing Covariate Fixed Covariate Forced Covariate Hot deck imputation Imputation Intent to treat Last value carried forwar
45. g variable The variable to be used forhotdeck imputation i ogs x Exclude the cases that have missing values in this grouping variable from the analysis Then 6 Ifthe Use the hot deck option is chosen you must select a variable in the dropdown listbox that will be used to impute the missing values in the grouping variable The dropdown list will contain a list of all of the variables in the data set in the same order as they appear in the datasheet If more than one matching respondent is found a value is randomly selected from within the imputation class If no matching respondent is found the respondent is selected at random from all of the used cases 7 Ifthe Exclude option is chosen all of those cases that are missing in the grouping variable are excluded and no missing values will be imputed in these cases 22 Imputation User Manual Imputation Multiple Imputation in SOLAS 3 0 Multiple Imputation replaces each missing value in the data set with several imputed values instead of just one First proposed by Rubin in the early 1970 s as a possible solution to the problem of survey non response the method corrects the major problems associated with single imputation see Appendix F references 1 to 5 Multiple Imputation creates M imputations for each missing value thereby reflecting the uncertainty about which value to impute The first set of the M imputed values is used to form the first imputed data
46. he Linear Regression Based Method regresses Y ps on Xops to obtain a prediction equation of the form Y a bX mis 2 A random element is then incorporated in the estimate of the missing values for each imputed data set The computation of the random element is based on a posterior drawing of the regression coefficients and their residual variances Refer to Appendix C for more detailed information about the analysis that is performed for Multiple Imputation using the Predictive Model Based Method Posterior Drawing of Regression Coefficients and Residual Variance Parameter values for the regression model are drawn from their posterior distribution given the observed data using non informative priors In this way the extra uncertainty due to the fact that the regression parameters can be estimated but not determined from Yops and Xobs is reflected Using estimated regression parameters rather than those drawn from its posterior distribution can produce improper results in the sense that the between imputation variance is under estimated For more detailed information see Appendix C Multiple Imputation Predictive Model Based Method Imputation User Manual 27 Imputation Discriminant Multiple Imputation Discriminant multiple imputation is a model based method for imputing binary or categorical variables Let i s be the categories of the categorical imputation variable y Bayes Theorem is used to calculate the p
47. ich data are missing We first explain how to run SOLAS so you can begin your analyses We then provide some general information about handling variables and cases in SOLAS and this is followed by overviews of the Single and Multiple Imputation techniques that are available in the new SOLAS 3 0 followed by examples using Single Imputation Multiple Imputation is then discussed in more detail with a description of the method SOLAS uses to sort Monotone and Non monotone missing data and displays the data patterns Then the Predictive Model Based Method and Propensity Score Methods are described and a description of the way in which SOLAS imputes Monotone and Non monotone missing data is given This is followed by two short examples one using the Predictive Model Based method and one using the Propensity Score method Several worked examples are provided next These examples demonstrate how each of the available methods for Single and Multiple Imputation are used Finally several appendices are given that detail formulae and methods and give references to literature NOTE This manual is intended as a user reference for SOLAS and as a guide to using the various distinct methods of imputation that SOLAS 3 0 provides It is not meant as a textbook for missing data nor is it intended as a comprehensive description of multiple imputation For this the user should consult the references given in Appendix F Imputation User Ma
48. ingness for that variable NOTE If two cases have the same sets of observed variables and the same sets of missing variables then these two cases have the same local missing data pattern A Monotone pattern of missingness or a close approximation to it can be quite common For example in longitudinal studies subjects often drop out as the study progresses so that all subjects have time measurements a subset of subjects have time 2 measurements only a subset of those subjects have time 3 measurements and so on SOLAS sorts variables and cases into a pattern that is as close as possible to a Monotone pattern Monotone patterns are attractive because the resulting analysis is flexible The resulting imputation is completely principled since only observed real data are used in the model to generate the imputed values See Rubin 1987 Chapter 5 Example of Sorting into a Monotone Missing Data Pattern In SOLAS 3 0 finding a Monotone missing data pattern consists of three main processes The first process sorts the variables in a datasheet from the most observed to the least observed This is demonstrated using a simple datasheet and the Variables List window By selecting the View menu in a datasheet window you can display the Missing Data Pattern and from the View menu in the Missing Data Pattern window you can display the Variable List windows as shown in this example The datasheet the unsorted data in the Missing Data Pattern win
49. ion 47 Missing Data 8 General 8 local 25 pattern 11 23 25 34 43 case numbers 24 display 25 window 24 43 reasons for 8 Missingness 28 Model Tolerance 36 longitudinal variables 40 Monotone missing data 23 25 30 31 38 Pattern 24 25 Multiple Imputation 10 25 47 53 56 Analyzing Multiply Imputed Data sets 43 48 generating 30 in SOLAS 23 Output 41 Overview 14 Report 42 Multiple Regression Estimates of Parameters 27 Multiply Imputed Data Sets example 43 Network Installation 5 Nominal variable 27 as covariate 21 Non monotone missing data 23 30 31 33 37 Omit Highlighted Variable 11 Ordinal variable 14 15 Output Log 36 38 40 43 Parameter combined estimate of 48 Posterior Drawing of regression coefficients 27 Predicted Mean 14 Example 20 Imputation 20 using discriminant multiple imputation 20 using ordinary least squares 20 Predictive Model 27 53 estimating 27 Example 33 Method 14 Propensity Score 14 28 47 59 Example 36 Method 14 68 Imputation User Manual Multiple Imputation 59 Random imputation 47 Randomization 36 40 README 4 References multiple imputation 62 SOLAS 62 Refinement Variable 29 39 Regression Output 43 Reports multiple imputation 42 pairwise missingness 24 Re specify new sort variables 17 Respondents matching 16 Rule to apply 17 Single Imputation 10 15 Group Means 13 15 Hot Deck Imputation 13 16 Last Va
50. ld then click on the variable name in the Elements listbox to enable the Initialize From Variable Name button then press this button to include all the MeasB variables in the Elements in Variable field 9 When you are satisfied that you have defined your longitudinal variable correctly click OK to finish Longitudinal Variables Name Meas MeasB Elements in Variable Element Period Elements MeasA_O 0 OBS a SYMPDUR Messit ii AGE MeasA_2 2 sqMeasA_2 Meas 4 Meat a 3 Transf Varie B iNew varate Npa Initialize from Variable Name Role oK Cancel Help LVCF Imputation 1 To perform LVCF Imputation choose Single Imputation Last Value Carried Forward from the datasheet Analyze menu 2 The two longitudinal variables that we created appear in the Longitudinal Variables list Drag and drop the variables MeasA and MeasB from the Longitudinal Variables list into the Variables to Impute field Specify Last Value Carried Forward Single Imputation Longitudinal Variables Variables to Impute Meas MeasB 3 When you are satisfied with your choice click OK The imputed data set is displayed with the imputed values appearing in Blue Grey The value from the last observed period is carried forward to fill in for missing values in later periods For example case 7 has a baseline value of 147 for MeasA but is missing for all subsequent periods This value of 147 is carried forward
51. lect the or signs to expand or contract the list of covariates for each imputation variable Imputation User Manual 33 Imputation Specify Predictive Model Based Method Multiple Imputation Ei Monotone Advanced Options Vars_to_Impute Covariates Forced ja Click on the sign in front of a variable name Meas_1 to expand contract it Meas To add additional covariates to a variable s Measi_3 regression pool drag the covariate into the list of covariates column MeasB_2 beside the variable To add a covariate to MeasB_3 all of the regression H H pools drag a variable name onto the title of the Covariate s column To toggle all of the 1X selections in the Forced column click on the column title Dragh Type Missing For each imputation variable the list of covariates will be made up of the variables specified as Fixed Covariates in the Base Setup tab and all of the other imputation variables Variables can be added and removed from this list of covariates by simply dragging and dropping the variable from the covariate list to the variables field or vice versa Even though a variable appears in the list of covariates for a particular imputation variable it might not be used in the final model The program first sorts the variables so that the missing data pattern is as close as possible to monotone and then for each missing v
52. led in with plausible estimates The goal of any imputation technique is to produce a complete data set that can be analyzed using complete data inferential methods The following describes the Single and Multiple imputation methods available in SOLAS 3 0 that are designed to accommodate a range of missing data scenarios in both longitudinal and single observation study designs Single Imputation Overview SOLAS 3 0 provides four distinct methods by which you can perform Single Imputation Group Means Hot deck Imputation Last Value Carried Forward and Predicted Mean imputation The Single Imputation option provides a standard range of traditional imputation techniques useful for sensitivity analysis Group Means Imputed values are set to the variable s group mean or mode in the case of categorical data Hot deck Imputation Imputed values are selected from responders that are similar with respect to a set of auxiliary variables Last Value Carried Forward LVCF The last observed value of a longitudinal variable is imputed Longitudinal variables are those variables intended to be measured at several points in time such as pre and post test measurements of an outcome variable made at monthly intervals laboratory tests made at each visit from baseline through the treatment period and through the follow up period For example if the blood pressures of patients were recorded every month over a period of six months
53. lerance Stepping Criteria Tolerance 0 000100 f Ftofnier e00 3 FtoRemove 0 1500 3 Imputation User Manual 35 Imputation Randomization Main Seed Value The Main Seed Value is used to perform the random selection within the propensity subsets The default seed is 12345 If you set this field to blank or set it to zero then the clock time is used Output Log The Output Log is a comprehensive list of regression equations etc that have been calculated for the imputed variable s Least Squares Regression Tolerance The value set in the Tolerance datafield controls numerical accuracy The tolerance limit is used for matrix inversion to guard against singularity No independent variable is used whose R with other independent variables exceeds 1 Tolerance You can adjust the tolerance using the scrolled datafield Stepping Criteria Here you can select F to Enter and F to Remove values from the scrolled datafields or enter your chosen value If you wish to see more variables entered in the model set the F to Enter value to a smaller value The numerical value of F to remove should be chosen to be less than the F to Enter value When you are satisfied that you have specified your analysis correctly click the OK button The multiply imputed datapages will be displayed with the imputed values appearing in Red or Blue Refer to Analyzing Multiple Imputed Data sets for further details of analyzing these data
54. lihood Estimates Under Non standard Conditions in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Berkeley University of California Press pp 221 233 Imputation User Manual 63 Appendices James I R 1995 A Note on the Analysis of Censored Regression Data by Multiple Imputation Biometrics 51 358 362 Johnson C L Curtin L R Ezzati Rice T M Khare M and Murphy R S 1993 Single and Multiple Imputation The NHANES Perspective paper presented at the Annual Meeting of the American Statistical Association San Francisco Kalton G 1983 Compensating for Missing Survey Data Ann Arbor MI Institute of Social Research University of Michigan Kennickell A B 1991 Imputation of the 1989 Survey of Consumer Finances Stochastic Relaxation and Multiple Imputation in Proceedings of the Survey Research Methods Section of the American Statistical Association pp 1 10 Kong A Liu J and Wong W H 1994 Sequential Imputation and Bayesian Missing Data Problems Journal of the American Statistical Association 89 278 288 Kott P S 1992 A Note on a Counter Example to Variance Estimation Using Multiple Imputation technical report U S National Agriculture Service Krewski D and Rao J N K 1981 Inference from Stratified Samples Properties of the Linearisation Jackknife and Balanced Repeated Replication Methods The Annals of Statistics 9 1010 1019 Lehmann E L
55. ls With Truncation of Patient Data Statistics in Medicine 14 1913 1925 5 Rubin D B 1996 Multiple Imputation After 18 Years Journal of the American Statistical Association 91 473 489 6 Anderson T W 1957 Maximum likelihood estimates for the mulitvariate normal distribution when some observations are missing JASA 52 200 203 7 Rubin D B 1974 Characterizing the estimation of parameters in incomplete data problems Journal of the American Statistical Association 69 467 474 Multiple Imputation and Related Literature References Box G E P Tiao G C 1973 Bayesian Inference in Statistical Analysis Reading Mass Adisson Wesley Chand N and Alexander C H 1994 Imputing Income for An N Person Consumer Unit Bureau of the Census paper presented at the American Statistical Association Annual Meeting in Toronto Clogg C Rubin D B Schenker N Schultz B and Weidman L 1991 Multiple Imputation of Industry and Occupation Codes in Census Public Use Samples Using Bayesian Logistic Regression Journal of the American Statistical Association 86 68 78 Efron B 1994 Missing Data Imputation and the Bootstrap with discussion Journal of the American Statistical Association 89 463 478 Efron B and Tibsharani R 1993 Assessment of Reported Differences Between Expenditures and Low Incomes in the U S Consumer Expenditure Survey Paper presented at the American Statistical Associati
56. lue One group Pooled Variance t test including t value df and p value Frequency Table Tables Row percentages Column percentages Total percentages Associated Measures Odds ratio including In Odds ratio Kappa statistic Cramers V Phi Test Statistic Likelihood ratio chi square Multiple Regression Regression Statistics Square root of Residual Mean Square Multiple Correlation Multiple Correlation Squared Analysis of Variance F Value p value Regression coefficients Partial Correlation Estimate of coefficient Standard error of coefficient Standardized coefficient t value of coefficient Confidence interval of coefficient Pooled Multiple Linear regression Equation 52 Imputation User Manual Appendix C Multiple Imputation Predictive Model Based Method COMPLETELY OBSERVED COVARIATES INCOMPLETELY OBSERVED COVARIATES Definition of Methods The following gives a detailed explanation of the methods used to analyze situations with completely and incompletely observed covariates for Linear Regression Based Multiple Imputation Completely observed covariates Let y be one imputation variable and let x Xp be the fully observed covariates for y Let Yous and Ymis be the observed and missing data for y respectively Let X be the data matrix for x Xp The first column of X consists of 1 s to adjust for the intercept term and the second until the la
57. lue Carried Forward 13 Overview 13 Software Requirements Sort variables 16 Sorting a Monotone Pattern Example 25 Standard Deviation pooled 49 Standard error of a point estimate 49 Statistics descriptive 51 Frequency Table 52 regression 52 t and non parametric 51 Stepping Criteria 36 Tail area probabilities 41 Tolerance 36 40 Uninstalling Solas 6 Variable Selection De selection 11 Variables adding and removing 38 binary 56 categorical 56 using 11 Variables for Sort order 17 View 11
58. metrika 79 103 111 Miller R G 1974 The Jackknife A Review Biometrika 61 1 17 64 Imputation User Manual Appendices Mislevy R J Johnson E G and Muraki E 1992 Scaling Procedures in NAEP Journal of Educational Statistics 17 131 154 Neyman J 1934 On the Two Different Aspects of the Representative Method The Method of Stratified Sampling and the Method of Purposive Selection Journal of the Royal Statistical Society A97 558 606 Paulin G D and Ferraro D L 1994 Do Expenditures Explain Income A Study of Variables for Income Imputation paper presented at the Annual Meeting of the American Statistical Association Toronto Rao J N K and Shao J 1992 Jackknife Variance Estimation With Survey Data Under Hot Deck Imputation Biometrika 79 811 822 Rubin D B 1977a Formalizing Subjective Notions about the Effect of Non respondents in Sample Surveys Journal of the American Statistical Association 72 538 543 Rubin D B 1977b The Design of a General and Flexible System for Handling Non Response in Sample Surveys working document prepared for the U S Social Security Administration Rubin D B 1978 Multiple Imputations in Sample Surveys A Phenomenological Bayesian Approach to Non response in Proceedings of the Survey Research Methods Section American Statistical Association pp 20 34 See also Imputation and Editing of Faulty or Missing Survey Data US Department of Commerce pp 1 23
59. missing The missing data are filled in by sampling from the cases that have a similar propensity to be missing The multiple imputations are independent repetitions from a Posterior Predictive Distribution for the missing data given the observed data 14 Imputation User Manual Imputation Single Imputation in SOLAS Single Imputation is the method in which each missing value in a data set is filled in with one value to yield one complete data set This allows standard complete data methods of analysis to be used on the filled in data set Group Means Missing values in a continuous variable will be replaced with the group mean derived from a grouping variable The grouping variable must be a categorical variable that has no missing data Of course if no grouping variable is specified missing values in the variable to be imputed will be replaced with its overall mean When the variable to be imputed is categorical with different frequencies in two or more categories providing a unique mode then the modal value will be used to replace missing values in that variable Note that if there is no unique mode i e if there are equal frequencies in two or more categories then if the variable is nominal a value will be randomly selected from the categories with the highest frequency If the variable is ordinal then the middle category will be imputed If the variable has an even number of categories a value is randomly chosen from
60. n longitudinal variables in SOLAS our first step will be to define the Longitudinal Variables in the data set 1 Open the file MI_TRIAL MDD 2 From the Variables menu select Define Variables gt Longitudinal 3 The system assigns the default name LVar1 to the first longitudinal variable Just type MeasA into the Name field to replace the default name 4 Click on the variable name in the Elements listbox to enable the Initialize From Variable Name button then press this button to include all the MeasA variables in the Elements in Variable field 5 The system automatically assigns a period value of zero to the first element and the remaining elements will be assigned period values of 1 2 etc You can change these values by typing in new values For example you might want to change the default period values if your repeated measurement were taken at baseline month1 month6 and month8 i e at unequal time intervals By setting the period values to 1 6 and 8 you will ensure that linear interpolation of bounded missings will be correct Here the measurements were taken at monthl month2 and month3 so the default values do not need be changed 6 Click on New Variable to define the elements of our second longitudinal variable 7 A dialog box appears asking if you want to save your changes to the longitudinal variable MeasA Click Yes 18 Imputation User Manual Imputation 8 Type the name MeasB in the name fie
61. ntered first If no term in the model has a p value less than this limit then the term with the largest p value greater than the removal value is removed Removal During backward stepping the term with the largest p value greater than the removal value is removed first Then any terms with entry p values less than the entry limit are entered Again for the purposes of this example we will run the analysis with the default settings Maximum Likelihood Criteria Maximum iterations to convergence Specifies the maximum number of iterations to maximize the likelihood function The default is 10 Likelihood function convergence criterion Specifies the convergence criterion for the likelihood function A relative improvement less than this value is considered no improvement The default is 00001 Parameter estimates convergence criterion Specifies the convergence criterion for the parameter estimates The default is 0001 When you are satisfied that you have specified your analysis correctly click the OK button The multiply imputed datapages will be displayed with the imputed values appearing in Red or Blue Refer to Analyzing Multiply Imputed Data Sets for further details of analyzing these data sets and combining the results Multiple Imputation Output The Multiple Imputation output either Propensity Score or The Predictive Model Based Method comprises five default value Multiple Imputation Data Pages From the View menu of the
62. nual 7 Imputation Enhancements to SOLAS Version 3 0 The following is a list of enhancements in the new SOLAS Version 3 0 A new Predictive Model Based Multiple Imputation technique has been added A new Predicted Mean Single Imputation technique has been added to the system The Propensity Score Based Multiple Imputation functionality has been extended The Missing Data Pattern window has been extended to allow viewing of Monotone and Non Monotone missing data patterns before imputation An Imputed Data Pattern window has been added Anew Combine function is available for the Multiple Imputation output Combined results are computed automatically for each data page Script language to support non interactive access to the file handling data management and Multiple Imputation functionality This will be available in version 2 1 All text output windows are upgraded to produce MSWord 7 0 compatible output e Multiple selection of variables using drag and drop SOLAS 3 0 is a 32 bit application Descriptions of all of these enhancements are included in this manual Missing Data Missing data are a pervasive problem in data analysis Missing values lead to less efficient estimates because of the reduced size of the database also standard complete data methods of analysis no longer apply For example analyses such as multiple regression use only cases that have complete data so including a variable
63. o sets of covariates are selected One set of covariates is used for imputing the Non monotone missing data entries and the other set of covariates is used for imputing the Monotone missing data entries in that variable After the missing data pattern is sorted the missing data entries are labeled as Non monotone or 30 Imputation User Manual Imputation Monotone For both sets of selected covariates for an imputation variable a special subset is the fixed covariates Fixed covariates are all selected covariates other than imputation variables and are used for the imputation of missing data entries for Monotone and Non monotone missing patterns This is only the case for fixed covariates Imputing the Non monotone Missing Data The Non monotone missing data are imputed for each sub set of missing data by a series of individual linear regression multiple imputations or discriminant multiple imputations using as much as possible observed and previously imputed data Information about Linear Regression and Discriminant Multiple Imputation in SOLAS 3 0 can be found in Appendix C Multiple Imputation Predictive Model Based Method First the leftmost Non monotone missing data are imputed Then the second leftmost Non monotone missing data are imputed using the previously imputed values This process continues until the rightmost Non monotone missing data are imputed using the previously imputed values for the other Non monotone missing d
64. od by executing the following steps 1 From the Analyze menu select Multiple Imputation and Predictive Model Based Method 2 The Specify Predictive Model window is displayed The window opens with two pages or tabs Base Setup and Advanced Options As soon as you select a variable to be imputed a Non Monotone tab and a Monotone tab are also displayed Base Setup Selecting the Base setup tab allows you specify which variables you want to impute and which variables you want to use as covariates for the predictive model Specify Predictive Model Based Method Multiple Imputation Base Setup Non Monotone Monotone Advanced Options Variables Variable s to Impute MeasA_1 MeasA_2 Number of Imputed Datasets F Variable Grouping SYMPDUR Bounded Missing MeasB_0 Jo asah oeni Drag Variable Type Missing 1 Using the datasheet MI_TRIAL drag and drop the variables MeasA_1 MeasA_2 MeasA_3 MeasB_1 MeasB_2 MeasB_3 into the Variables to Impute field 2 Drag and drop the variables SYMPDUR AGE MeasA_0 and MeasB_0 into the Fixed Covariates field 3 As there is no Grouping variable in this data set we can leave this field blank Non Monotone Selecting the Non monotone tab allows you to add or remove covariates from the predictive model used for imputing the non monotone missing values in the data set These can be identified in the Missing Data Pattern mentioned earlier You se
65. on Annual Meeting Toronto Ezzati Rice T M Johnson W Khare M Little R J A Rubin D B and Schafer J L 1995 A Simulation Study to Evaluate The Performance of Multiple Imputations in NCHS Health 62 Imputation User Manual Appendices Examination Survey in Proceedings of the Bureau of the Census Eleventh Annual Research Conference pp 257 266 Ezzati Rice T M Khare M and Schafer J L 1993 Multiple Imputation of Missing Data in NHANES III paper presented at the American Statistical Association Annual Meeting San Francisco Fahimi M and Judkins D 1993 Serial Imputation of NHANES III With Mixed Regression and Hot Deck Technique paper presented at the American Statistical Association Annual Meeting San Francisco Fay R E 1990 VPLX Variance Estimation for Complex Surveys Proceedings of the Survey Research Methods Section American Statistical Association pp 266 271 Fay R E 1991 A Design Based Perspective on Missing Data Variance in Proceedings of the 1991 Annual Research Conference U S Bureau of the Census pp 429 440 Fay R E 1992 When are Inferences from Multiple Imputation Valid in Proceedings of the Survey Research Methods Sections American Statistical Association pp 227 232 Fay R E 1993 Valid Inferences from Imputed Survey Data paper presented at the Annual Meeting of the American Statistical Association San Francisco Fay R E 1996 Alternative Paradigms fo
66. one observed value after the period for which it is missing The following table shows an example of bounded missing values The variables Month to Month6 are a set of longitudinal measures 101 10 20 50 missing shaded bounded missing Linear interpolation can be used to fill in missing values that are longitudinal variables So for example using linear interpolation patient 101 s missing values for months 2 3 and 5 would be imputed as follows 60 T T T T T MONTH So the imputed value for month 2 will be 13 33 the imputed value for month 3 will be 16 67 and for month 5 will be 35 Generating the Multiple Imputations After the missing data pattern is sorted and the missing data entries are either labeled as Non monotone missing or Monotone missing the imputations are generated in two steps 1 The Non monotone missing data entries are imputed first 2 Then the Monotone missing data entries are imputed using the previously imputed data for the Non monotone missing data entries The Non monotone missing data entries are always imputed using a Predictive Model Based Multiple Imputation The Monotone missing data entries are imputed by the user specified method which can be either the Propensity Score method or the Predictive Model Based method Covariates that are used for the generation of the imputations are selected for each imputation variable separately For each imputation variable tw
67. ou that you have finished installing SOLAS 3 0 You must restart your computer before using SOLAS 3 0 so choose Yes if you are ready to use the program Click on the Finish button to continue 4 Imputation User Reference Manual Installation Network Installation If you have a license to install SOLAS 3 0 on more than one client computer then you can install SOLAS 3 0 on a network and have each end user install from there Installing SOLAS 3 0 over the network is a two step process 1 The administrator creates the administrative installation point by installing the SOLAS 3 0 installation and program files to the server 2 Users install SOLAS 3 0 by running Setup from the administrative installation point The same Setup exe is used for each of these steps although in two different modes To avoid confusion in the discussion below these two modes will be referred to as follows Administrative Installation To create the Administrative Installation Point you run Setup exe from the SOLAS 3 0 CD Before doing the administrative installation of SOLAS 3 0 make sure that the destination folders are empty If a previous version of SOLAS exists delete all of it Client Installation To perform a client installation a user runs Setup exe from the administrative installation point Administrative Installation To install SOLAS 3 0 on the Administrative Installation Point 1 Run Setup exe from the SO
68. ournal of the American Statistical Association 82 528 550 Treiman D J Bielby W and Cheng M 1987 Significance Levels from Public Use Data With Multiply Imputed Industry Codes unpublished doctoral thesis Harvard University Dept of Statistics Treiman D J Bielby W and Cheng M 1989 Evaluation of a Multiple Imputation Method for Recalibrating 1970 U S Census Detailed Industry Codes to the 1980 Standard Sociological Methodology 18 309 345 van Buuren S van Mulligen E M and Brand J P L Treiman D J Bielby W and Cheng M 1995 Omgaan Met Ontbrekende Gevevens in Statistische Databases Multiple Imputatie in HERMES Kwantitatieve Methadone 50 503 504 van Buuren S van Rijckevorsel J L A Rubin D B Treiman D J Bielby W and Cheng M 1993 Multiple Imputation by Splines in Bulletin of the International Statistical Institute Contributed Papers IT 503 504 Weld L Wolter K M 1984 Introduction to Variance Estimation New York Springer Verlag 66 Imputation User Manual Index Administrative installation 5 Advanced Options 35 39 Base Setup 33 36 Bayesian Bootstrap 14 28 59 Binary variable 14 Bounded Missing 30 46 Case selection 11 Cases closest matching 60 61 deleted 8 Categorical variable 14 15 28 Client Installation 5 Combine 41 47 51 Confidence interval 49 for the point estimate 49 Continuous variable 14 15 as sort variable 18 Covariate 46
69. pensity score Random imputation Combine Imputation variable Glossary A variable that is made up of a set of repeated measurements over time The sample mean of a variable is used to replace any missing data for that variable This mean can be an overall mean of all the cases or a within group or class mean Each missing value is replaced by two or more M plausible estimates in order to create M complete data sets A covariate that has not been forced into a regression model and so can be entered or removed during stepping Is the conditional probability of missingness computed from a vector of observed covariates A respondent is chosen at random from the total respondent sample for a variable and the missing value for a non respondent is replaced by the respondent s value The procedure for combining the set of M results into one overall set of results A variable that has values that need to be imputed Imputation User Manual 47 Appendix A Analyzing Multiply Imputed Data sets ESTIMATED PARAMETERS Definitions of Estimated Parameters The following shows how M complete data analyses can be combined to create one repeated imputation inference See Rubin and Schenker 1991 Multiple Imputation in Health Care Data Bases An Overview and Some Applications Statistics in Medicine 10 585 598 and Rubin D B 1987 Multiple Imputation for Non response in Surveys New York John Wiley For each of the M
70. r the Analysis of Imputed Survey Data Journal of the American Statistical Association this issue 490 498 Fisher R A 1925 Theory of Statistical Estimation Proceedings of the Cambridge Philosophical Society 22 700 725 Fisher R A 1934 Discussion of on the Two Different Aspects of the Representative Method of Stratified Sampling and the Method of Purposive Selection by J Neyman Journal of the Royal Statistical Society Ser A 97 614 619 Freedman V 1990 Using SAS to Perform Multiple Imputation Discussion Paper Series UI PSC 6 The Urban Institute Washington DC Gelfand A E and Smith A F M 1990 Sampling Based Approaches to Calculating Marginal Densities Journal of the American Statistical Association 85 398 409 Gelman A and Rubin D B 1992 Inference from Iterative Simulation Using Multiple Sequences with discussion Statistical Science 7 457 472 Hansen M H 1987 A Conversation with Morriss Hansen I Olkin interviewer Statistical Science 2 162 179 Heitjan D F and Rubin D B 1990 Inference from Coarse Data via Multiple Imputation with Application to Age Heaping Journal of the American Statistical Association 85 304 314 Herzog T and Lancaster C 1980 Multiple Imputation Modeling for Individual Social Security Benefit Amounts in Proceedings of the Survey Research Methods Section American Statistical Association pp 398 403 Huber P J 1976 The Behavior Maximum Like
71. ression model are drawn from its posterior distribution given the observed data using non informative priors For reference see Appendix F 1 and 2 In this way the extra uncertainty due to the fact that the regression parameters can be estimated but not determined from Y and X is reflected Using estimated regression parameters rather than those drawn from its posterior distribution results in improper imputation in the sense that the between imputation variance is under estimated In steps 6 and 7 the parameters drawn from its posterior distribution are used together with the covariates Xmis to generate the imputation Y mis Incompletely observed covariates Let y be an imputation variable and let x Xp be the incompletely observed covariates for y Let R be the response indicator for xj The variable R is defined by 1 if x j is observed Rj 0 if x j is not observed The indicator method is based on the following statistical model for y y Bot Bo l Ri Bop l Rp By Rix B RpxptE with Nlo o In this model the term B Rj x is zero when xj is missing and is equal to B x j when xj is observed When xj is missing the intercept term is adjusted by the term B I R If a covariate x is completely observed then the corresponding term B ll R disappears By adjusting the data matrix X the algorithm in shown in Completely Observed Covariates can be applied 54 Imputation User Manual
72. ro then the clock time is used Output Log The Output Log is a comprehensive list of regression equations etc that have been calculated for the imputed variable s Least Squares Regression Tolerance The value set in the Tolerance datafield controls numerical accuracy The tolerance limit is used for matrix inversion to guard against singularity No independent variable is used whose R with other independent variables exceeds 1 Tolerance You can adjust the tolerance using the scrolled datafield Stepping Criteria Here you can select F to Enter and F to Remove values from the scrolled datafields or enter your chosen value If you wish to see more variables entered in the model set the F to Enter value to a smaller value The numerical value of F to remove should be chosen to be less than the F to Enter value Logistic Regression Options The Logistic Regression options are as follows Model Tolerance Controls the numerical accuracy Computations are performed in double precision Use a value that is greater than 000001 but less than 1 0 The default is 0001 40 Imputation User Manual Imputation Tail area probabilities to control entry or removal of terms from the model Specifies the limits for the tail area probabilities p values for the appropriate x and F values used to control the entry and removal of terms Entry During forward stepping the term with the smallest p value less than the entry value is e
73. robability that a missing value in the imputation variable y is equal to its j category given the set of the observed values of the covariates and of y For more details see Appendix D Discriminant Multiple Imputation Propensity Score The system applies an implicit model approach based on Propensity Scores and an Approximate Bayesian Bootstrap to generate the imputations The underlying assumption about Propensity Score Multiple Imputation is that the non response of an imputation variable can be explained by a set of covariates using a logistic regression model The multiple imputations are independent repetitions from a Posterior Predictive Distribution for the missing data given the observed data Variables are imputed from left to right through the data set so that values that are imputed for one variable can be used in the prediction model for missing values occurring in variables to the right of it The system creates a temporary variable that will be used as the dependent variable in a logistic regression model This temporary variable is a response indicator and will equal 0 for every case in the imputation variable that is missing and will equal 1 otherwise The independent variables for the model will be a set of baseline fixed covariates that we think are related to the variable we are imputing For example if the variable being imputed is period t of a longitudinal variable the covariates might include the previous periods t 1
74. rom within each imputed data set with the variability of the estimate across m imputed data sets The standard error of a combined parameter estimate can be found by taking the square root of the variance of a combined parameter estimate The pooled standard error of a point estimate SE Q SATa N A m Tre OL m B where U 5 U is the within imputation variance M iz where U the standard error of the point estimate from the i data set F and 1 Qa Bn a Q Q is the between imputation variance mI jay where Q corresponding point estimate calculated from the i data set y a The pooled confidence interval for the point estimate Q 1 V 1 p SEQ where corresponds to a 1 amp 100 C I and SEQ is the pooled standard error of the point estimate as shown above 1 R 1 fm m 1 v Vm m Pm ue M n l Ym Vm See John Barnard and Donald B Rubin Biometrika Small sample degrees of freedom with multiple imputation December 1999 Volume 86 No 4 where v degrees of freedom used in case of complete data and where com l m B Yn T m and Imputation User Manual 49 Appendices m By ASO Q and T Ons l m B M 1 jz and m v m 1 tr1 and r JS and m where U the standard error of the point estimate from the i data set f 50 Imputation User Manual Appendix B Combined Statistics STATISTICS
75. s program or any portion of it may result in severe civil and criminal penalties and will be prosecuted to the maximum extent possible under law 3 The Welcome message reminds you to exit from all programs It also offers a chance to exit from the SOLAS 3 0 installation You can click on the Cancel button any time you wish to cancel the installation Click on the Next button to continue Solas 2 0 License Agreement x eI Please read the following license agreement STATISTICAL SOLUTIONS LICENSE AGREEMENT THIS DOCUMENT SETS FORTH THE TERMS AND CONDITIONS OF THE LICENSE SOFTWARE RECEIVED BY YOU IF YOU DO NOT ACCEPT THIS AGREEMENT YOU MAY RETURN THIS SOFTWARE UNDAMAGED WITHIN 10 DAYS OF RECEIPT AND YOUR MONEY WILL BE REFUNDED We GRANT OF LICENSE In consideration of payment of the LICENSE fee which is part of the price you paid for this product Statistical Solutions as LICENSOR grants to you the LICENSEE a non exclusive right to use this copy of SOLAS z Do you accept these terms 4 The Statistical Solutions License Agreement is displayed Scroll and read through the agreement If you do not agree with the terms of the license click on the No button and you will go to an exit dialog If you do agree with the terms of the license click on the Yes button to continue 2 Imputation User Reference Manual Installation User Information x Please enter your name the name of the company for whom
76. sA_3 The baseline measurement for the response variable MeasA and three post baseline measurements taken at month 1 month 2 and month 3 MeasB_0 MeasB_1 MeasB_2 and MeasB_3 The baseline measurement for the response variable MeasB and three post baseline measurements taken at month 1 month 2 and month 3 The variables OBS SYMPDUR AGE MeasA_0 and MeasB_90 are all fully observed and the remaining 6 variables contain missing values To view the missing pattern for this data set do the following 1 From the datasheet window select View and Missing Data Pattern In the Specify Missing Data Pattern window press the Use All button 2 From the View menu of the Missing Data Pattern window select View Monotone Pattern to display the window shown below FA Missing Data Pattern MI_TRIAL File Use View Rerun Window Help Variables 21 of fn Jo Ja l5 J212 FJ J 118 fe fo S I Present 8 Non Monotone Variable List Variable No Variable Name 1 OBS SYMPDUR AGE Note that after sorting the data into a Monotone pattern the time structure of the longitudinal measures is preserved so the missing data pattern in this data set is Monotone over time 3 Toclose the Missing Data Pattern window select File and Close 32 Imputation User Manual Imputation Predictive Model Based Method Example We will now multiply impute all of the missing values in this data set using the Predictive Model Based Meth
77. scribed above That the propensity scores are used rather than these estimated probabilities is for reasons of numerical stability Divide propensity scores into c Quantile subsets Using the options in the Donor Pool window the cases of the data sets can be subdivided into c subsets according to the quantiles of the assigned propensity scores where c 5 is the default value of c This is done by sorting the cases of the data sets according to their assigned propensity scores in ascending order as shown by the following n The i th sub set will consist of the cases from the i 1 1 2 l th case until the Cc C For each missing data entry of y the set of observed values of y used to generate the imputations are the observed values of the sub set of cases where this missing data entry belongs at fal Fd 2 th case in the sorted data set for i 1 c where x is the integer part of x Use c Closest Matching Cases For each missing data entry i Y mis where the index i refers to the i th missing data entry of y The subset of observed values used for generating the imputations for the missing entry are the c 2 observed values before and the c 2 1 2 observed values of y after the missing value to be imputed after sorting on propensity The initial values of y are the observed values with an assigned propensity score closest to and lower than the propensity score assigned to i Ymi r Then the e
78. sets and combining the results Propensity Score Method Example We will now multiply impute all of the missing values in the data set using the Propensity Score Based Method 1 From the Analyze menu select Multiple Imputation and Propensity Score Method 2 The Specify Propensity Method window is displayed and is a tabbed paged window The window opens with two pages or tabs Base Setup and Advanced Options As soon as you select a variable to be imputed a Non Monotone tab a Monotone tab and a Donor Pool tab are also displayed Base Setup Selecting the Base Setup tab allows you specify which variables you want to impute and which variables you want to use as covariates for the logistic regression used to model the missingness 36 Imputation User Manual Imputation Specify Propensity Method Multiple Imputation x Base Setup Non Monotone Monotone Donor Pool Advanced Options Variables Variable s to Impute OBS MeasA_1 f Meas_2 Number of Imputed Datasets E Meas_3 MeasB_1 O MeasB_2 MeasB_3 Guanes Variable Longitudinal 3 p Variables Fixed covariate s SYMPDUR pcs Missing AGE MeasB_O Je Linearity Intersotate MeasA_0 Urea Venable Type Missing 1 Drag and drop the variables MeasA_1 MeasA_2 MeasA_3 MeasB_1 MeasB_2 MeasB_3 into the Variables to Impute field 2 Drag and drop the variables SYMPDUR AGE MeasA_0 and MeasB_0 into the Fixed
79. st column contains the observations for x1 Xp Let Xop and Xmis be the rows of X corresponding to Yobs and Ymis respectively The underlying statistical model of linear regression imputation is given by Y Bot Bix Br Xpt E where N 0 67 Let q be equal to p 1 The parameter q equals the number of regression coefficients including the intercept Each imputation Y mis for Y mis is independently generated in the following steps 1 Let B and G2 be the least squares estimators of 8 Bo By By and of o from Yos and Xobs Let V be the inverse of the matrix x as IIx and V be a square root of V that can be obtained via the Choleski decomposition of V obs Let P be the matrix of eigenvectors of V and A be the diagonal matrix with A equal to the eigenvalue of V corresponding to the eigenvector of V given by the i th column in P 1 2 The square root V of V is then given by y P Al with A with the diagonal matrix containing the square roots of A as its diagonal elements 2 Drawa xX a random variable g Novs 4 3 Let 6 6 nops 8 Imputation User Reference Manual 53 Appendices 4 Draw q independent random variables Z Z from N 0 1 and let Z Z Z 5 Let B B or y 2z 6 Draw nis independent variables z Dt ce from N 0 1 and let e o0 z with z z z Nmis 7 Let Y mis X mish e In steps 1 to 5 the parameter values for the reg
80. t field 4 Since we would expect irises of the same species to be similar with respect to the various measurements we select SPECIES as our primary sort variable and then select SEPALLEN as the secondary sort variable Specify Hot deck Single Imputation Variables Varial Variables for PETALWID SEPALWID Cae PETALLEN SPECIES SEPALLEN Note Sorting wil be performed in the order of appearance of Type variables in this lst Missing Select first respondents value Rule to apply when more then one matching respondent is found C Randomly select from matching respondents m Rule to apply when no matching respondent is found OK Respecily new sort variables Perform random overall imputation Do not impute the missing value 5 Under Rule to apply when more than one matching respondent is found choose Randomly select from matching respondents 6 Under Rule to apply when no matching respondent is found choose Re specify new sort variables 7 When you are satisfied with your choice click OK The imputed values are displayed in the color orange The system sorts the data set in ascending order so SPECIES is sorted first and SEPALLEN is sorted next Then for each missing value the system finds all respondents with matching values for these two variables Thus case 96 which is missing in SEPALWID has SPECIES 1 and SEPALLEN 5 0 There are 7 respondents in this imputation class cases
81. t for each group of cases having the same observed value as the specified grouping variable More detailed information about variables is given in Chapter of the Systems Manual Data Management and the sections Specifying Variable Attributes and Defining Variables Variable Selection De selection There are several options regarding the variables of a data set that is to be analyzed These options can be displayed from the datasheet Use menu as shown below E Datasheet MI_TRIAL View Window Help Use Add Highlighted Variables to Use List Remove Highlighted Variables from Use List Use All Variables Use All Cases Define Case Selection 5 1 87 w ij afl Sh a You can select and de select variables by using the datasheet View menu and selecting Missing Data Pattern described later to display the Missing Data Pattern window shown below To de select a variable right click at the top of any column in the missing pattern to highlight the variable then choose Omit Highlighted Variable from the Use menu fE Missing Value Pattern MI_TRIAL of x File Use View Window Help Omit Highlighted Variable Define case selection Case selection can be applied in two ways Systematic or User defined by choosing Define Case Selection from the Use menu in a datasheet Depending on the selection one of the windows shown below is displayed Imputation User Manual 11 Impu
82. tation Systematic Case Selection Use Every B thCase Starting at Case D Help For Systematic case definition numerical selection can be applied For User defined case specification conditional and logical operators can be applied to selected cases within variables as shown in the right hand window above A table showing the operators their meanings and their keyboard entries is given in Chapter 1 of the System Manual Data Management canei Multiple Drag and Drop 12 Imputation User Manual Specify Case Use Condition on Case MeasA_1 gt 180 AND MeasA_1 lt 240 Variables SE Case Selection gt MISSING Use Only Cases Meeting Condition EJ SYMPDUR C Use All But Cases Meeting Condition aE 0 Add Cases Meeting Condition to Use List Remove Cases Meeting Condition from Use List Multiple variables can be dragged by holding down the lt Ctrl gt key and selecting highlighting variables The Drag Variable controls will not be enabled If some of the variables being dragged into a data field are inappropriate for that data field the system will display a message and those variables will not be placed in the field The remaining variables in the multiple selection will be moved as intended Vasiable s to Impute Imputation Overviews of Imputation in SOLAS Imputation is the name given to any method whereby missing values in a data set are fil
83. the observed values of the covariates is imputed The imputation scheme for discriminant single imputation in case of predicted mean imputation is obtained from the imputation scheme for discriminant multiple imputation as follows In step v LL is replaced by fi LL is replaced by A T is replaced by nj Mobs T is replaced by ny Nobs and P is replaced by Pij where neps is the number of observed values of the imputation variable Step vi is replaced by Let y be equal to the category j which maximizes the probability P for Teed 8 In step vii y is replaced by Po 58 Imputation User Manual Appendix E Propensity Score Multiple Imputation PROPENSITY SCORE MULTIPLE IMPUTATION DIVIDE PROPENSITY SCORE INTO C QUANTILE SUBSETS USE C CLOSEST MATCHING CASES USE D CLOSEST MATCHING CASES Propensity Score Multiple Imputation An implicit model approach based on Propensity Scores and an Approximate Bayesian Bootstrap is used to generate the imputations The multiple imputations are independent repetitions from a Posterior Predictive Distribution for the missing data given the observed data The imputation scheme is described below 1 The regression coefficient b of the logistic regression model of the response indicator R of the imputation variable y on the selected covariates including the intercept term are estimated ii To each case a propensity score is assigned which is equal to xX b
84. this list of covariates by simply dragging and dropping the variable from the covariate list to the variables field or vice versa Even though a variable appears in the list of covariates for a particular imputation variable it might not be used in the final model Imputation User Manual 37 Imputation The program first sorts the variables so that the missing data pattern is as close as possible to monotone and then for each missing value in the imputation variable the program works out which variables from the total list of covariates can be used for prediction By default all of the covariates are forced into the model If you uncheck a covariate it will not be forced into the model but will be retained as a possible covariate in the stepwise selection Details of the models that were actually used to impute the missing values are included in the Output log that can be selected from the View menu of the Multiply Imputed Data Pages These data pages will be displayed after you have specified the imputation and pressed the OK button in the Specify Predictive Model window Monotone Selecting the Monotone tab allows you to add or remove covariates from the logistic model used for imputing the monotone missing values in the data set These can be identified in the Missing Data Pattern mentioned earlier Specify Propensity Method Multiple Imputation Ei Base Setup Non Monotone A onor Pool Advanced Options
85. tion for Interval Estimation from Simple Random Samples With Ignorable Non response Journal of the American Statistical Association 81 366 374 Imputation User Manual 65 Appendices Rubin D B and Schenker N 1987 Interval Estimation from Multiply Imputed Data A Case Study using Agriculture Industry Codes Journal of Official Statistics 3 375 387 Schafer J L 1996 Analysis of Incomplete Multivariate Data by Simulation New York Chapman and Hall Schafer J L and Schenker N 1991 Variance Estimation with Imputed Means Proceedings of the Survey Research Methods Section American Statistical Association pp 696 701 Schafer J L Kare M Little F J A and Rubin D B 1993 Multiple Imputation of NHANES III paper presented at the Annual Meeting of the American Statistical Association San Francisco Schenker N 1989 The Use of Imputed Probabilities for Missing Binary Data in Proceedings of the 5th Annual Research Conference Bureau of the Census pp 133 139 Schenker N Treiman D J and Weidman L 1993 Analyses of Public Use Decennial Census Data with Multiply Imputed Industry and Occupation Codes Applied Statistics 42 545 556 Smith A F M and Gelfand A E 1992 Bayesian Statistics Without Tears A Sampling Resampling Perspective The American Statistician 46 84 88 Tanner M A and Wong W H 1987 The Calculation of Posterior Distributions by Data Augmentation with discussion J
86. to fill in for these missing periods This imputed data set can be saved for analysis later or exported to any other statistics package see Chapter 1 Data Management in the Systems Manual Imputation User Manual 19 Imputation Predicted Mean Imputation Predicted Mean Imputation is performed using an Ordinary Least Squares Regression or a Discriminant analysis A general description of these methods is given below Ordinary Least Squares Using the Least squares method missing values are imputed using predicted values from the corresponding covariates using the estimated linear regression models This method is used to impute all the continuous variables in a data set Discriminant Discriminant Multiple Imputation is a model based method for binary or categorical variables For each missing data entry the category with the largest conditional probability given the values of the selected covariates is imputed More detailed information can be found in Appendix D Discriminant Multiple Imputation Predicted Mean Imputation Example This example uses the data set MI_TRIAL MDD located in the SAMPLES subdirectory 1 Open the datasheet MI_TRIAL MDD select Analyze Single Imputation and the Predicted Mean option to display the Specify Predicted Mean window 2 Drag the variables to be imputed the chosen Covariates and the Grouping Variable between the Variable s the Variable s to Impute and the Covariate s listboxes
87. ue of y that is to be imputed a smaller sub set is selected on the basis of the association between y and w This smaller sub set will then be used to generate the imputations For each missing value of y the imputations are randomly drawn according to the Approximate Bayesian Bootstrap method from the chosen sub set of observed values of y Using this method also described in Rubin 1987 Multiple Imputation for Nonresponse in Surveys referenced in Appendix F 1 a random sample with replacement is randomly drawn from the chosen sub set of observed values to be equal in size to the number of observed values in this sub set The imputations are then randomly drawn from this sample The Approximate Bayesian Bootstrap method is applied in order to reflect the extra uncertainty about the predictive distribution of the missing value of y given the chosen sub set of observed values of y This predictive distribution can be estimated from the chosen sub set of observed values of y but not determined Drawing the imputations randomly from the chosen sub set of observed values rather than applying the Approximate Bayesian Bootstrap would result in improper imputation in the sense that the between imputation variance is underestimated Imputation User Manual 29 Imputation Bounded Missing This type of missing value can only occur when a variable is longitudinal It is a missing value that has at least one observed value before and at least
88. utation User Manual 23 Imputation Specify Missing Data Pattern Variables Variable s to Display m OBS EERE SYMPDUR Variable AGE Meas _0 Meas _1 MeasA_2 Bounded Missing Meas4_3 MeasB_O x Fe linear interpolate Longitudinal Variables uf Use All Cancel Drag Variable Type Help Missing After specifying the variables to use and pressing the OK button a Missing Data Pattern window is displayed below left with the missing pattern before imputation From the View menu of a Missing Data Pattern window you can display the Monotone pattern below right You can also display a legend from which you can easily identify the missing data type s FJ Missing Data Pattern Health Examplel ifa E3 E Missing Data Pattern Health Example lal Ea File Use View Remun Window Help File Use View Rerun Window Help Variables 4 Variables 4 a Present i Non Monotone i Monotone From the View menu of a Missing Data Pattern window you can display Pairwise Missingness Presence These display a matrix that contsins the number of cases that are missing present in each pair of variables If you right click on any of the cells in the missing pattern a new panel will display the case number the variable name and its status Also from the View menu you display an Options window which allows you to choose between various options to use in the display
89. variates variable variable missing observed Variables to Covariate Covariate Information Meash 1 s 50 49 1 Impute MeasA_1 MeasA 0 Complete Imputation Variable MeasA 2 MeasA 0 Complete MeasA_2 Cases included in imputation model Total included Included with Included with imputation imputation variable variable missing PROCEDURE observed Imputation Variable MeasA 2 z5 a 7 7 asA 1 Equation for imputing missing values Measa_1 25 0222 1 0665 MeasA_O Imputation Variable g a f Nm f cok 37 line 6 Page 1 I Num CAPS Cok O Lines 101 Page 1 NOTE There are no missing values in the variable chosen as the Covariate in this example but if there were the following window would be displayed The covariate you have chosen has missing values Use hot deck imputation to impute the covariate The variable to be used for hot deck imputation is ogs v regression pool Exclude the cases that have missing values in this covariate from the analysis Include a missingness indicator variable for this covariate in Cancel Help Then 1 Ifthe Use hot deck imputation option is chosen you must select a variable in the dropdown listbox that will be used to impute the missing values in the Covariate The dropdown list contains a list of all of the variables in the data set in the same order as they appear in the datasheet If more than one matching respondent is found a value is r
90. ve M complete data sets each of which can be analyzed using standard complete data statistical methods If you select Descriptive Statistics Regression t test Frequency Table from the Analyze menu of any data page the analysis will be performed on all 5 data sets The analysis generates 5 pages of output one corresponding to each of the imputed data sets and a Combined page Imputation User Manual 43 Imputation that gives the overall set of results The tabs at the bottom of the page allow you to display each data set This example uses the imputation results from the data set MI_TRIAL MDD that was used in the Propensity Score example earlier Part of data page 1 for that example is shown below Multiple Imputation Data Pages MMI_TRIAL oix Fle Edt Vatiables Analyze Plot Format View Window Help Hea Descriptive Statistics TUE Eeen qq 2521 00 Erequency Table 6084 08 Oe 65025 90 276 297 88209 90 294 297 88209 00 228 162 26244 00 228 162 321 336 112896 00 213 201 40401 00 216 252 63504 00 288 297 88209 00 303 279 7841 00 285 237 56169 00 276 237 709 740 naan an co Fage Zh Page Page 4 Page 87 1 From the data page Analyze menu select t and Nonparametric Tests to display the Specify t test Analysis window Two group paired and one group comparisons of means r Paired BS fi
91. we would refer to this as one longitudinal variable consisting of six repeated measures or periods Linear interpolation is another method for filling in missing values in a longitudinal variable If a missing value has at least one observed value before and at least one observed value after the period for which it is missing then linear interpolation can be used to fill in the missing value Although this method logically belongs in the LVCF option for historical reasons it is only available as an imputation method from within either the Propensity Score Based Method or the Predictive Model Based Method For further details see the Bounded Missing section Imputation User Manual 13 Imputation Predicted Mean Imputed values are predicted using an ordinary least squares multiple regression algorithm to impute the most likely value when the variable to be imputed is continuous or ordinal When the variable to be imputed is a binary or categorical variable a discriminant method is applied to impute the most likely value Multiple Imputation Overview SOLAS 3 0 provides two distinct methods for performing Multiple Imputation The Predictive Model Based Method Propensity Score Based Method Using either method each missing value is replaced by M M 2 2 imputed values to create M complete data sets Multiple Imputation has the following properties Once the multiple imputations are generated the resulting data sets can be used
92. with numerous missing values would severely reduce the sample size When cases are deleted if one or more variables has missing values the number of remaining cases can be small even if the missing data rate is small for each variable For example suppose your data set has 5 variables measured at the start of study and monthly for six months You have been told with great pride that each variable is 95 complete If each of these 5 variables has a random 5 of the values missing then the proportion of cases that are expected to be complete are 1 95 435 0 834 That is only 17 of the cases would be complete and you would lose 83 of your data Missing data also cause difficulties in performing Intent to Treat analyses in randomized experiments Intent to Treat IT analysis dictates that all cases complete and incomplete be included in any analyses Biases may exist from the analysis of only complete cases if there are systematic differences between completers and dropouts To select a valid approach for imputing missing data values for any particular variable it is necessary to consider the underlying mechanism accounting for missing data Variables in a data set may have values that are missing for different reasons 8 Imputation User Manual Imputation A laboratory value might be missing because It was below the level of detection The assay was not done because the patient did not come in for a scheduled visit The assay

Read Manual - Statistical Solutions

Contents

Download Pdf Manuals

Related Search

Related Contents