Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Different types of missing data and MI |
Date | Mon, 13 Jun 2011 20:50:09 -0400 |
Clyde Schechter <clyde.schechter@einstein.yu.edu>: If it's true that your covariate falling below the detection limit is not predictable from other covariates or outcomes, then it is apparently orthogonal and can be omitted with no effect; if that's true, then various imputation schemes should also produce essentially the same estimates for other coefs. You might estimate a -tobit- of the 1/3 missing covariate on other covariates (and possibly the outcome) and predict based on xb and a random draw from the error distribution; a detection limit is one of the few instances in which -tobit- works really well in simulations. You can also omit the covariate; you can also replace with the detection limit, or zero, and include a dummy for missing (all known to be problematic in some cases, but not yours). There are more options than these four, but if these four produce similar results, you have a very good footnote for whatever table makes the final cut: the results are invariant to this choice. On Mon, Jun 13, 2011 at 7:20 PM, Clyde Schechter <clyde.schechter@einstein.yu.edu> wrote: > My problem is a third kind of missing data. One of the covariates is the > result of a lab assay that has a lower limit of detectability. So these > data are not missing in the full sense, rather they are left censored at > the lower limit of detectability (or, more properly, interval censored > between zero and the lower limit of detectability). I don't know what to > do with these. -mi- doesn't seem applicable since these are certainly not > missing at random. And any way I can think of to try to impute values > here strikes me as inherently invalid because it appears that the data > simply do not contain any information whatsoever about the relationship > between this variable and the outcome (or anything else) in the > undetectable range. And I don't know of any analytic methods that handle > interval-censored independent variables. > > For now, because the lower limit of detectability is close to zero, and > because analyses and graphical explorations excluding these cases suggest > that this variable is not associated with the outcome anyhow, I've done an > analysis where I simply recode these particular values as zero. But I > can't escape the feeling that this is not really defensible. > > There are two alternatives I would prefer. One is to simply omit these > cases altogether--but there are a lot of them, about a third of the > sample, and it would leave us rather underpowered. The other is to just > drop this variable (especially since it doesn't seem to be associated with > the outcome anyway, at least outside the censoring range)--but the > variable was actually identified in our study aims as one of the key > predictors of interest. (I guess we weren't very prescient!) > > Any advice would be appreciated. Thanks in advance * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/