Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is a simple dummy imputation a valid procedure?

From   Andrea Bennett <>
Subject   Re: st: Is a simple dummy imputation a valid procedure?
Date   Tue, 12 Jul 2011 14:55:48 +0200

Thank you both for your great suggestions! To clarify: Will this mean that I also cannot use a dummy for people who answer "don't know"? E.g. asking somebody about the educational background of his parents is difficult for a student. We therefore added a "don't know" field. I thought in this case it is quite ok to add each of the categories individually to the regression (e.g. no_education high_school university dont_know).

It could be made the case that this is too a case of imputation because we have no idea what reason caused a student to check the "don't know" box. He just might not want to tell and goes for the best outside option, he really might not know it, or he is feeling ashamed.

But I've just ordered Allison's book and hope it will give me a better feeling for the data.

Best regards!


On Jul 12, 2011, at 13:45 , Maarten Buis wrote:

> On Tue, Jul 12, 2011 at 12:11 PM, Andrea Bennett asked:
>> I know there are more advanced methods to deal with missing data (ICE or mi). But if one is not interested in the variable containing missing values per se and only wants to include this variable as an additional control because the distribution of missing values of this particular variable is not quite equal between a control and a treatment group, is it then ok to use the median income of students in the same class and impute this value where a students income is missing, add a dummy for where there was a missing value and interact the two?
> On Tue, Jul 12, 2011 at 12:55 PM, daniel klein responded:
>> According to Allison(2002) you cannot do this. Not only will the point
>> estimate for the variable with missing values be biased, but also the
>> point estimates for other variables, even if the data were MCAR
>> (Allison 2002: 9-10), in which case listwise deletion may be an
>> acceptable alternative to imputation.
>> Allison, P.D. (2002) Missing Data. Thousand Oaks, CA: Sage.
> There is a bit more to this than that. If you look at footnote 4 of
> Allison (2002) you can see that there is a special case where this
> dummy method does make sense: This will happen when there was a
> special reason why income is missing, e.g. income is only observed
> when one has a job. If you use the dummy variable method in that case
> the coefficient income is the effect income when one has a job, and
> the dummy compares people with average income and a job to people
> without a job. In these cases the missing values you see in Stata
> aren't real missing values in the sense that a value needs to exist in
> before it to be missing, and in observations without a job income does
> not exist so it cannot be missing. If you have genuine missing values
> on income, e.g. persons have an income but refuse to tell it you, than
> Daniel's remark is correct and you cannot use that dummy method.
> Hope this helps,
> Maarten
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
> --------------------------
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index