Hot deck imputation

Speakers:  Adrian P. Mander, University Forvie Site, and David G. Clayton, Cambridge University

Missing data can be a serious problem in many statistical analyses. The problems are manifested as a loss of efficiency or possible bias. If the data is Missing Completely At Random (MCAR) then analysis of the data using case deletion (ignoring the lines of data with any missing) will give unbiased answers but inflated confidence intervals. Missing At Random (MAR) can lead to bias if the data is analysed using case deletion. There are solutions using likelihood based approaches; however, these may rely on assumptions on the missing data process. Imputation is an alternative which can solve MCAR and MAR problems with relatively few assumptions. The function that will be discussed only imputes values for one variable in the dataset. The imputation method involves bootstrapping the observed data values and hence is called hotdeck imputation, a term first used by the survey practitioners. The method will be illustrated by using an example taken from a two-stage sample case-control study design. A comparison is made between the biased case deletion analysis and the imputed analysis.