Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Missingness |

Date |
Tue, 28 Aug 2012 10:53:03 +0200 |

On Tue, Aug 28, 2012 at 9:42 AM, Brendan Churchill wrote: > I am using some ordinal variables, which have some numeric missing values, in a multilevel model. In some previous research, I have seen researchers include a 'Missing' independent variable in their model to account for some of the 'missingness' - or rather to control for the missing values, but I don't quite understand how to do it in Stata or even if that's a good way to do it. I've tried to make a binary variable in which the missing values are coded 1 and the rest of the values are coded 0 but the model rejects this because it's collinear. > > Is this how you do it? Or is there a variable for the entire data set that is created to account for all missing variables? The most common "method" is to ignore all observations with at least one missing value. This is fine as long as the probability of missingness is not related to the dependent/explained/left-hand-side/y variable. In that case, the estimates will still be consistent, you just loose power. Since you have missing values on the dependent variable this means that the probability of missingness needs to be unrelated to the unobserved values on that dependent variable. There is obviously no way to check that, but often you have a reasonable idea how the missing values came to be and you can use that to make it plausible that this is so. The safest method is indeed to just ignore the observations with missing values, as long as you can make a plausible case that the probability of missingness is not related to the unobserved missing values of the dependent variable (possibly after controlling for any other variable in your model). When you believe that the probability of missingness is (strongly) dependent on the unobserved missing values (even after controlling for all other variables in your model) than you are in a lot of trouble. In essence your data does not have the information necessary to estimate what you want, and no amount of statistical trickery can create information that is not present in the data. Methods that claim to deal with these situations just replace information from the data with "information" from (often untestable) assumptions and the results from these methods rest rather heavily on the correctness of these assumptions. In those cases I think it is safer to remember John Tukey (1986, p.74-75): "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." The method you proposed does not work for dependent variables. For independent/explanatory/right-hand-side/x-variables you need to be very careful: This method only makes sense when missing value means "the value does not exist" rather than "the value exists but has not been observed". See: <http://www.stata.com/statalist/archive/2007-12/msg00030.html>. Hope this helps, Maarten John Tukey (1986), "Sunset salvo". The American Statistician 40(1):72-76. --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Missingness***From:*Brendan Churchill <Brendan.Churchill@utas.edu.au>

- Prev by Date:
**st: xml_tab, then xmluse, not working** - Next by Date:
**Re: st: Missingness** - Previous by thread:
**Re: st: Missingness** - Next by thread:
**Re: st: Missingness** - Index(es):