Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Hotdeck command

Subject   Re: st: Hotdeck command
Date   Tue, 23 Mar 2004 10:31:45 +0000

There is a good reason why -hotdeck- does line imputation and not item
If you imputed single items then you are in fact destroying correlations
between the variable
you are imputing and the other variables. Hence if you come to analyse this
variable in a regression
against the y variable then you have just induced a measurement error in
your x variable and hence
have all the problems this entails. This is the bias that some paper
criticised the approximate Bayesian bootstrap of

Now if you want to go against this advice then you will either have to use
a multivariate imputation method that assumes
some multivariate distribution of your four variables (see Schaffer's book)
OR you have to impute each variable separately.

With the number of imputations say of 5...
Then first  impute first variable and create 5 imputed datasets..
Then to impute the second variable you have to impute on the 5 imputed
datasets to create 25 datasets....

Until you get 625 datasets...
Then you could use hotdeck to accept the 625 datasets and do the analyses.
Or in fact you could take a random sample of the 625 datasets and it should
still be ok.
Or use fewer imputed datasets at each round.

Any problems you have with replication of your results you can email me


Adrian Mander MSc PhD, Principal Statistician, GlaxoSmithKline, Mail Code
HW8133, New Frontiers Science Park (South), Harlow, Essex, CM19 5AW. Tel:
01279 63 1203 Fax: 01279 64 4003

                      "Jennifer Wheeler"                                                                                                       
                      Sent by:                         To:                                                 
                                                       Subject: st: Hotdeck command                                                            
                      22-Mar-2004 18:51                                                                                                        
                      Please respond to                                                                                                        

Esteemed Statalist users:

I am a new Stata user and have been experimenting with the hotdeck command.
I am attempting to impute values for line non-response as well as item
non-response and have come across some difficulties in setting the seed so
can reproduce my results.  The sample is stratified by reg and spec.

My first question goes as follows:  is there a way to use hotdeck to impute
for both item and line non-response?

I have started out by creating a variable ("impute") that identifies all
cases for which to impute a line of data in a series of four variables
(i.e., the cases have missing values for each of the four variables.) I
run a hotdeck command to impute:

hotdeck type size number setting, store by (reg spec) keep (id_item) imp(1)
seed (123456789)

This procedure seems to be imputing new values for each variable if any one
variable is missing (so it is imputing a new line of data for a case
requiring only item imputation).  I want to be able to conserve the
information I have for those cases that are missing some (and not all)
and only impute for the variables that are missing.  To reconcile this I
attempted a second step where I ran hotdeck again on the original variables
individually for only those cases that are not complete lines of missing

hotdeck type if impute==0, store by (reg spec) keep (id_item) imp(1) seed

hotdeck size impute==0, store by (reg spec) keep (id_item) imp(1) seed


I used the values generated from the first step for the line non-response
and the values from the second step for the item non-response.  Upon
examination of these variables it appears like the procedure yields the
imputed variables each time the program is run.  However, when attempting
calculate a weighted estimate of the "number" variable for the population
I get different results each time.  Is there a way to correct this?

Is there a more straight-forward way to use hotdeck imputation for item and
line non-response?
Any suggestions would be greatly appreciated.

J Wheeler

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index