Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Different types of missing data and MI


From   "Clyde Schechter" <clyde.schechter@einstein.yu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Different types of missing data and MI
Date   Mon, 13 Jun 2011 16:20:09 -0700

I am analyzing a data set that has three different types of missing data
in it.  The data come from an observational study, and the primary
analysis involves testing an outcome which is more or less log-normally
distributed, contrasting its distribution in two groups.  The overall
analytic scheme is:

-glm outcome i.group covariates, link(log) family(gaussian)-

Some of the missing data arose because of a freezer failure which left
several of the blood specimens unanalyzable.  These cases are missing a
bunch of covariates (but not all, as some of the covariates are not based
on the lab data).  I am comfortable treating these as missing completely
at random.

Some of the missing data is the kind of haphazard missing data that is
just typical in clinical studies.  While I am personally not comfortable
with thinking of this as missing at random, it seems to be the way of the
world to treat this kind of situation as such, and at least for
presentpurposes I can go along.

So the above fall within the scope of data that I can handle with -mi-
commands.

My problem is a third kind of missing data.  One of the covariates is the
result of a lab assay that has a lower limit of detectability.  So these
data are not missing in the full sense, rather they are left censored at
the lower limit of detectability (or, more properly, interval censored
between zero and the lower limit of detectability).  I don't know what to
do with these.  -mi- doesn't seem applicable since these are certainly not
missing at random.  And any way I can think of to try to impute values
here strikes me as inherently invalid because it appears that the data
simply do not contain any information whatsoever about the relationship
between this variable and the outcome (or anything else) in the
undetectable range.  And I don't know of any analytic methods that handle
interval-censored independent variables.

For now, because the lower limit of detectability is close to zero, and
because analyses and graphical explorations excluding these cases suggest
that this variable is not associated with the outcome anyhow, I've done an
analysis where I simply recode these particular values as zero. But I
can't escape the feeling that this is not really defensible.

There are two alternatives I would prefer.  One is to simply omit these
cases altogether--but there are a lot of them, about a third of the
sample, and it would leave us rather underpowered.  The other is to just
drop this variable (especially since it doesn't seem to be associated with
the outcome anyway, at least outside the censoring range)--but the
variable was actually identified in our study aims as one of the key
predictors of interest.  (I guess we weren't very prescient!)

Any advice would be appreciated.  Thanks in advance.


Clyde Schechter
Associate Professor of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

Please note new e-mail address: clyde.schechter@einstein.yu.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index