Point well-taken about asking for partially-informed opinions, thanks.
The cube root works, although the fit (i.e. predicted vs. observed outcomes)
is not quite as good as with the log-transformed variable. Conceptually,
I'm not sure that I understand the problem in throwing away the
zeroes--essentially I would be saying that there is substantially more
uncertainty at the zero point than at any measurable one, and therefore I
want my model to predict outcome relative to the lowest measurable
concentration rather than relative to the zero point. It does affect the
numerical results of goodness-of-fit tests--is that the conceptual problem?
But making a somewhat arbitrary choice for the zero point also has problems.
Would it be cheating to just replace the zero with the best-fit as
determined by solving the logit equation backwards? Probably...
[please feel free to censor debate if this has gone back & forth too many
[mailto:email@example.com] On Behalf Of Nick Cox
Sent: Monday, June 06, 2005 8:45 AM
Subject: RE: st: how to deal with censoring at zero (a lot of zeroes) for a
I am always queasy when expected to approve,
or invited to disapprove, a proposed analysis.
How can anyone give a really worthwhile opinion
of what is sensible for someone's project on
this amount of information?
Nevertheless the notion of throwing away
half the data on this basis is rather alarming.
I have found cube roots often useful for non-negative
variables. This is partly empirical, partly that
zero goes to zero, but there is also an arm-waving basis
that cube roots work well for gamma distributions (cf. the
Wilson-Hilferty transformation). More generally,
powers falling towards zero in effect have the logarithm
as their limit.
> Maarten, Kevin,
> Thank you very much for your replies. So for now I am just
> going give up
> trying to make distributional assumptions and to drop the half of the
> observations which are zero or non-detectable prior to log
> transforming the
> predictor and to creating the logistic model. In fact,
> whether I do this or
> change the zero to half of the lowest detectable value (i.e.
> .005) doesn't
> have much of an effect on the logistic odds ratio.
> If anybody has any objections to this (or sees how a
> statistical reviewer
> for a medical journal might have objections), please let me know.
> -----Original Message-----
> From: firstname.lastname@example.org
> [mailto:email@example.com] On Behalf Of maartenbuis
> Sent: Sunday, June 05, 2005 7:38 PM
> To: firstname.lastname@example.org
> Subject: Re: st: how to deal with censoring at zero (a lot of
> zeroes) for a
> laboratory re
> I am tired: The cirtical assumption behind Multiple Imputation is that
> the probability of missingness does not depend on the value of the
> missing variable itself (Missing At Random, or MAR). This is obviously
> not the case with censoring. My objection against (conditional) mean
> imputation, and my remark about selecting on the independent variables
> still hold. So, given that you have a large number of observations, I
> would just ignore the zero observations.
> --- In email@example.com, "maartenbuis"
> <maartenbuis@y...> wrote:
> > Hi Daniel,
> > It looks to me like you could use -tobit- for log(tropin) and just a
> > constant. The predicted values should give you the
> extrapolations you
> > want. (This will be the same value for all missing observations: the
> > mean of the log-normal distribution conditional on being
> less than the
> > censoring value)
> > However, These are actually missing values, and apperently
> you want to
> > create imputations for them. If you just use the values you obtained
> > from -predict- you will be assuming that you are as sure about these
> > values as you are about the values you actually observed,
> and thus get
> > standard errors that are too small. If you really want to
> impute, than
> > you could have a look at -mice- (findit mice). Alternatively, you
> > could use the results from -tobit- to generate multiple imputations.
> > Mail me if you want to do that, and I can write, tonight or
> > an example for the infamous auto dataset. However, censoring on the
> > independent variable is generally much less a problem than censoring
> > on the dependent variable, so ignoring (throwing away) the censored
> > observation, should not lead to very different estimates.
> > HTH,
> > Maarten
> > --- "Daniel Waxman" <dan@a...> wrote:
> > > I am modeling a laboratory test (Troponin I) as an independent
> > > (continuous) predictor of in-hospital mortality in a sample of
> > > 10,000 subjects. <snip> The problem is the zero values,
> what they
> > > represent, and what to do with them. The distribution
> of results
> > > ranges from the minimal detectable level of .01 mcg/L to
> 94 mcg/L,
> > > with results markedly skewed to the left (nearly half the results
> > > are zero; 90% are < .20. results are given in increments
> > > of .01). Of course, zero is a censored value which represents a
> > > distribution of results between zero and somewhere below .01.
> > <snip.
> > > I found a method attributed to A.C. Cohen of doing
> essentially this
> > > which uses a lookup table to calculate the mean and standard
> > > deviation of an assumed log-normal distribution based upon the
> > > non-censored data and the proportion of data points that are
> > > censored, but there must be a better way to do this in Stata.
> > >
> > > Any thoughts on (1) whether it is reasonable to assume the
> > > log-normal distribution (I've played with qlognorm and
> plognorm, but
> > > it's hard to know what is good enough), and if so (2) how
> to do it?
* For searches and help try:
* For searches and help try: