Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: imputation- multiple imputation

From   Fred Wolfe <>
Subject   Re: st: imputation- multiple imputation
Date   Wed, 17 Dec 2003 06:12:28 -0600

At 11:38 AM 12/17/2003 +0000, you wrote:

i have a data set with missing data on both continuous and categorical data
, for example smoking status (non smoker, ex smoker and smoker) i would
like to impute values for the missing data, my problem relates to the
categorical data, what techniques are available for imputing missing
categorical data and what programs would i need for the imputation, i have
access at the moment to spss and stata , if possible any references to the
use of missing data imputation would also be useful
I have found that Adrian Mander's hotdeck program, used appropriately, can work very well. However it requires a -by(varlist)- to define strata for the imputation. For smoking, my bylist would include age, sex, and PERHAPS some other characteristic that identifies propensity (social class or function). So I would categorize age by decades and similarly categorize the last variable. The problem that one runs into is that soon you have a great may groups. If this turns out to be too many for your sample then you have to reduce the number of categories.

A typical line in our modification of Adrian's program would be this:

. ndbhotdeck ethorig edlevel smoke marital , by( q10_age sex) newvar impute skippattern

This yields 20 categories from which to sample.


Fred Wolfe

Fred Wolfe
National Data Bank for Rheumatic Diseases
Wichita, Kansas
Tel (316) 263-2125 Fax (316) 263-0761

* For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index