# st: Hotdeck problem

 From Joelle M Anderson To statalist@hsphsun2.harvard.edu Subject st: Hotdeck problem Date Sun, 30 Mar 2008 12:37:52 -0500

For my thesis, I am using the hotdeck program to impute values for missing cases in my income variable. Currently, I am trying to hotdeck my income variable (176 missing) using 3 variables (age=27 missing; education=13 missing; gender=0 missing; although with 9 overlapping missing
values the combination of these three variables only has 31 missing cases total). Yet when Stata creates my new, hotdecked income variable, there are an additional 19 missing cases that I can't account for (missing=50). Does anyone know why this might be? Another strange thing is that, when I try to rename my hotdecked income measure before merging it with my full dataset, all 50 missing cases remain missing after merging; when I do not rename my hotdecked income
measure before merging, only 42 missing cases remain missing after merging. I have pasted my Stata output below. Any help would be greatly appreciated!

Joelle Anderson
University of Wisconsin-Milwaukee
anders35@uwm.edu

//First hotdeck imputation, renaming the income variable BEFORE merging with full dataset

. hotdeck incomeR using IncomeHD, store by(education sex ageR) keep(resp incomeR)
DELETING all matrices....

Table of the Missing data patterns
* signifies missing and - is not missing

Varlist order: incomeR

pattern | Freq. Percent Cum.
------------+-----------------------------------
* | 176 11.72 11.72
- | 1,326 88.28 100.00
------------+-----------------------------------
Total | 1,502 100.00
WARNING: When the <command> option is not selected
then no analysis is performed on the imputed datasets

. use "C:\data\IncomeHD1.dta", clear

. tab incomeR

RECODE of |
income |
(income. |
last year, |
that is in |
2004, what |
total famil | Freq. Percent Cum.
------------+-----------------------------------
1 | 103 7.09 7.09
2 | 164 11.29 18.39
3 | 222 15.29 33.68
4 | 148 10.19 43.87
5 | 162 11.16 55.03
6 | 265 18.25 73.28
7 | 178 12.26 85.54
8 | 124 8.54 94.08
9 | 86 5.92 100.00
------------+-----------------------------------
Total | 1,452 100.00

. rename incomeR incomez

. merge resp using "C:\Documents and Settings\anders35\My Documents\Thesis_3_29.dta", unique sort

. tab incomez

RECODE of |
income |
(income. |
last year, |
that is in |
2004, what |
total famil | Freq. Percent Cum.
------------+-----------------------------------
1 | 103 7.09 7.09
2 | 164 11.29 18.39
3 | 222 15.29 33.68
4 | 148 10.19 43.87
5 | 162 11.16 55.03
6 | 265 18.25 73.28
7 | 178 12.26 85.54
8 | 124 8.54 94.08
9 | 86 5.92 100.00
------------+-----------------------------------
Total | 1,452 100.00

//Second hotdeck imputation, renaming the hotdecked income variable AFTER merging with full dataset

. hotdeck incomeR using IncomeHotD, store by(education sex ageR) keep(resp incomeR)
DELETING all matrices....

Table of the Missing data patterns
* signifies missing and - is not missing

Varlist order: incomeR

pattern | Freq. Percent Cum.
------------+-----------------------------------
* | 176 11.72 11.72
- | 1,326 88.28 100.00
------------+-----------------------------------
Total | 1,502 100.00
WARNING: When the <command> option is not selected
then no analysis is performed on the imputed datasets

. clear

. use "C:\data\IncomeHotD1.dta", clear

. tab incomeR

RECODE of |
income |
(income. |
last year, |
that is in |
2004, what |
total famil | Freq. Percent Cum.
------------+-----------------------------------
1 | 98 6.75 6.75
2 | 162 11.16 17.91
3 | 220 15.15 33.06
4 | 153 10.54 43.60
5 | 159 10.95 54.55
6 | 267 18.39 72.93
7 | 178 12.26 85.19
8 | 126 8.68 93.87
9 | 89 6.13 100.00
------------+-----------------------------------
Total | 1,452 100.00

. merge resp using "C:\Documents and Settings\anders35\My Documents\Thesis_3_29.dta", unique sort

. rename incomeR incomey

. tab incomey

RECODE of |
income |
(income. |
last year, |
that is in |
2004, what |
total famil | Freq. Percent Cum.
------------+-----------------------------------
1 | 98 6.71 6.71
2 | 164 11.23 17.95
3 | 222 15.21 33.15
4 | 153 10.48 43.63
5 | 159 10.89 54.52
6 | 270 18.49 73.01
7 | 178 12.19 85.21
8 | 126 8.63 93.84
9 | 90 6.16 100.00
------------+-----------------------------------
Total | 1,460 100.00

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/