Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Different types of missing data and MI


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Different types of missing data and MI
Date   Mon, 13 Jun 2011 20:50:09 -0400

Clyde Schechter <clyde.schechter@einstein.yu.edu>:
If it's true that your covariate falling below the detection limit is
not predictable from other covariates or outcomes, then it is
apparently orthogonal and can be omitted with no effect; if that's
true, then various imputation schemes should also produce essentially
the same estimates for other coefs.  You might estimate a -tobit- of
the 1/3 missing covariate on other covariates (and possibly the
outcome) and predict based on xb and a random draw from the error
distribution; a detection limit is one of the few instances in which
-tobit- works really well in simulations.  You can also omit the
covariate; you can also replace with the detection limit, or zero, and
include a dummy for missing (all known to be problematic in some
cases, but not yours).  There are more options than these four, but if
these four produce similar results, you have a very good footnote for
whatever table makes the final cut: the results are invariant to this
choice.

On Mon, Jun 13, 2011 at 7:20 PM, Clyde Schechter
<clyde.schechter@einstein.yu.edu> wrote:
> My problem is a third kind of missing data.  One of the covariates is the
> result of a lab assay that has a lower limit of detectability.  So these
> data are not missing in the full sense, rather they are left censored at
> the lower limit of detectability (or, more properly, interval censored
> between zero and the lower limit of detectability).  I don't know what to
> do with these.  -mi- doesn't seem applicable since these are certainly not
> missing at random.  And any way I can think of to try to impute values
> here strikes me as inherently invalid because it appears that the data
> simply do not contain any information whatsoever about the relationship
> between this variable and the outcome (or anything else) in the
> undetectable range.  And I don't know of any analytic methods that handle
> interval-censored independent variables.
>
> For now, because the lower limit of detectability is close to zero, and
> because analyses and graphical explorations excluding these cases suggest
> that this variable is not associated with the outcome anyhow, I've done an
> analysis where I simply recode these particular values as zero. But I
> can't escape the feeling that this is not really defensible.
>
> There are two alternatives I would prefer.  One is to simply omit these
> cases altogether--but there are a lot of them, about a third of the
> sample, and it would leave us rather underpowered.  The other is to just
> drop this variable (especially since it doesn't seem to be associated with
> the outcome anyway, at least outside the censoring range)--but the
> variable was actually identified in our study aims as one of the key
> predictors of interest.  (I guess we weren't very prescient!)
>
> Any advice would be appreciated.  Thanks in advance

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index