[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Adrian.P.Mander@gsk.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Hotdeck command |

Date |
Tue, 23 Mar 2004 10:31:45 +0000 |

There is a good reason why -hotdeck- does line imputation and not item imputation. If you imputed single items then you are in fact destroying correlations between the variable you are imputing and the other variables. Hence if you come to analyse this variable in a regression against the y variable then you have just induced a measurement error in your x variable and hence have all the problems this entails. This is the bias that some paper criticised the approximate Bayesian bootstrap of doing! Now if you want to go against this advice then you will either have to use a multivariate imputation method that assumes some multivariate distribution of your four variables (see Schaffer's book) OR you have to impute each variable separately. With the number of imputations say of 5... Then first impute first variable and create 5 imputed datasets.. Then to impute the second variable you have to impute on the 5 imputed datasets to create 25 datasets.... e.t.c. Until you get 625 datasets... Then you could use hotdeck to accept the 625 datasets and do the analyses. Or in fact you could take a random sample of the 625 datasets and it should still be ok. Or use fewer imputed datasets at each round. Any problems you have with replication of your results you can email me directly cheers Ade Adrian Mander MSc PhD, Principal Statistician, GlaxoSmithKline, Mail Code HW8133, New Frontiers Science Park (South), Harlow, Essex, CM19 5AW. Tel: 01279 63 1203 Fax: 01279 64 4003 "Jennifer Wheeler" <jwheele1@tulane.edu> Sent by: To: statalist@hsphsun2.harvard.edu owner-statalist@hsphsun2. harvard.edu cc: Subject: st: Hotdeck command 22-Mar-2004 18:51 Please respond to statalist@hsphsun2.harvar d.edu Esteemed Statalist users: I am a new Stata user and have been experimenting with the hotdeck command. I am attempting to impute values for line non-response as well as item non-response and have come across some difficulties in setting the seed so I can reproduce my results. The sample is stratified by reg and spec. My first question goes as follows: is there a way to use hotdeck to impute for both item and line non-response? I have started out by creating a variable ("impute") that identifies all cases for which to impute a line of data in a series of four variables (i.e., the cases have missing values for each of the four variables.) I then run a hotdeck command to impute: hotdeck type size number setting, store by (reg spec) keep (id_item) imp(1) seed (123456789) This procedure seems to be imputing new values for each variable if any one variable is missing (so it is imputing a new line of data for a case requiring only item imputation). I want to be able to conserve the information I have for those cases that are missing some (and not all) items and only impute for the variables that are missing. To reconcile this I attempted a second step where I ran hotdeck again on the original variables individually for only those cases that are not complete lines of missing data. hotdeck type if impute==0, store by (reg spec) keep (id_item) imp(1) seed (123456789) hotdeck size impute==0, store by (reg spec) keep (id_item) imp(1) seed (123456789) etc. I used the values generated from the first step for the line non-response and the values from the second step for the item non-response. Upon examination of these variables it appears like the procedure yields the same imputed variables each time the program is run. However, when attempting to calculate a weighted estimate of the "number" variable for the population -- I get different results each time. Is there a way to correct this? Is there a more straight-forward way to use hotdeck imputation for item and line non-response? Any suggestions would be greatly appreciated. J Wheeler * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: concatenate, leading zeros** - Next by Date:
**Re: st: Autocorrelation in panel data.** - Previous by thread:
**st: Hotdeck command** - Next by thread:
**st: Competing risks** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |