.- help for ^ineqerr^ [STB-51: sg116] .- . Impute missing values using the hotdeck method ---------------------------------------------- . ^hotdeck^ varlist [^if^ exp] [^in^ range] [^using^ filename] [^,^ ^by(^varlist^)^ ^store^ ^imp^ute^(^#^)^ ^noise^ ^keep(^varlist^)^ ^com^mand^(^command^)^ ^parms(^varlist^)^ ] . . Description ----------- . ^hotdeck^ will tabulate the missing data patterns within the ^varlist^. A row of data with missing values in any of the variables in the ^varlist^ is defined as a `missing line' of data, similarly a `complete line' is one where all the variables in the ^varlist^ contain data. The ^hotdeck^ procedure replaces the ^varlist^ variables in the `missing lines' with the corresponding values in the `complete lines'. ^hotdeck^ should be used several times within a multiple imputation sequence since missing data are imputed stochastically rather than deterministically. The ^nmiss^ missing lines in each stratum of the data described by the `by' option are replaced by lines sampled from the ^nobs^ complete lines in the same stratum. The approximate Bayesian bootstrap method of Rubin and Schenker(1986) is used; first a bootstrap sample of ^nobs^ lines are sampled with replacement from the complete lines, and the ^nmiss^ missing lines are sampled at random (again with replacement) from this bootstrap sample. . A major assumption with the hotdeck procedure is that the missing data are either missing completely at random (MCAR) or is missing at random (MAR), the probability that a line is missing varying only with respect to the categorical variables specified in the `by' option. . If a dataset contains many variables with missing values then it is possible that many of the rows of data will contain at least one missing value. The ^hotdeck^ procedure will not work very well in such circumstances. There are more elaborate methods that ^only^ replace missing values, rather than the whole row, for imputed values. These multivariate multiple imputation methods are discussed by Schafer(1997). . . Options ------- ^by(^varlist^)^ specifies categorical variables defining strata within which the imputation is to be carried out. . ^store^ specifies whether the imputed datasets are saved to disk. . ^using^ specifies the root of the imputed datasets filenames. The default is "imp" and hence the datasets will be saved as imp1.dta, imp2.dta, .... . ^keep(^varlist^)^ specifies the variables saved in the imputed datasets in addition to the imputed variables and the by list. By default the imputed variables and the by list are always saved. . ^impute(^#^)^ specifies the number of imputed datasets to generate. The number needed varies according to the percentage missing and the type of data, but generally 5 is sufficient. . ^command(^command^)^ specifies the analysis performed on every imputed dataset. . ^noise^ specifies whether the individual analyses, from the ^command^ option, are displayed . ^parms(^varlist^)^ specifies the parameters of interest from the analysis. If the ^command^ is a regression command then the parameter list can include a subset of the variables specified in the regression command.The final output consists of the combined estimates of these parameters. . . Examples -------- . ^hotdeck y, by(sex age) ^ - Impute values for y in sex/age groups . ^hotdeck y using imp,store by(sex age) impute(2)^ - Store imputed datasets as imp1.dta and imp2.dta. . ^hotdeck y x, by(sex age) command(logit y x) parms(x _cons) impute(5)^ - Do not save imputed datasets but carry out a logistic regression on the imputed dataset and display the coefficients for x and the constant term of the model. . . Authors ------- . Adrian Mander MRC Biostatistics Unit Cambridge, UK adrian.mander@@mrc-bsu.cam.ac.uk David Clayton MRC Biostatistics Unit Cambridge, UK david.clayton@@mrc-bsu.cam.ac.uk