Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re
Date   Mon, 6 Jun 2005 13:45:29 +0100

I am always queasy when expected to approve, 
or invited to disapprove, a proposed analysis. 
How can anyone give a really worthwhile opinion 
of what is sensible for someone's project on 
this amount of information? 

Nevertheless the notion of throwing away 
half the data on this basis is rather alarming. 

I have found cube roots often useful for non-negative 
variables. This is partly empirical, partly that 
zero goes to zero, but there is also an arm-waving basis 
that cube roots work well for gamma distributions (cf. the 
Wilson-Hilferty transformation). More generally,  
powers falling towards zero in effect have the logarithm 
as their limit. 

[email protected] 

Daniel Waxman
> Maarten, Kevin,
> Thank you very much for your replies.  So for now I am just 
> going give up
> trying to make distributional assumptions and to drop the half of the
> observations which are zero or non-detectable prior to log 
> transforming the
> predictor and to creating the logistic model.  In fact, 
> whether I do this or
> change the zero to half of the lowest detectable value (i.e. 
> .005) doesn't
> have much of an effect on the logistic odds ratio.
> If anybody has any objections to this (or sees how a 
> statistical reviewer
> for a medical journal might have objections), please let me know.
> Daniel
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of maartenbuis
> Sent: Sunday, June 05, 2005 7:38 PM
> To: [email protected]
> Subject: Re: st: how to deal with censoring at zero (a lot of 
> zeroes) for a
> laboratory re
> I am tired: The cirtical assumption behind Multiple Imputation is that
> the probability of missingness does not depend on the value of the
> missing variable itself (Missing At Random, or MAR). This is obviously
> not the case with censoring. My objection against (conditional) mean
> imputation, and my remark about selecting on the independent variables
> still hold. So, given that you have a large number of observations, I
> would just ignore the zero observations.
> Maarten 
> --- In [email protected], "maartenbuis" 
> <maartenbuis@y...> wrote:
> > Hi Daniel,
> > 
> > It looks to me like you could use -tobit- for log(tropin) and just a
> > constant. The predicted values should give you the 
> extrapolations you
> > want. (This will be the same value for all missing observations: the
> > mean of the log-normal distribution conditional on being 
> less than the
> > censoring value)
> > 
> > However, These are actually missing values, and apperently 
> you want to
> > create imputations for them. If you just use the values you obtained
> > from -predict- you will be assuming that you are as sure about these
> > values as you are about the values you actually observed, 
> and thus get
> > standard errors that are too small. If you really want to 
> impute, than
> > you could have a look at -mice- (findit mice). Alternatively, you
> > could use the results from -tobit- to generate multiple imputations.
> > Mail me if you want to do that, and I can write, tonight or 
> tomorrow,
> > an example for the infamous auto dataset. However, censoring on the
> > independent variable is generally much less a problem than censoring
> > on the dependent variable, so ignoring (throwing away) the censored
> > observation, should not lead to very different estimates.
> > 
> > HTH,
> > Maarten
> > 
> > --- "Daniel Waxman" <dan@a...> wrote:
> > > I am modeling a laboratory test (Troponin I) as an independent 
> > > (continuous) predictor of in-hospital mortality in a sample of 
> > > 10,000 subjects.  <snip> The problem is the zero values, 
> what they 
> > > represent, and what to do with them.   The distribution 
> of results 
> > > ranges from the minimal detectable level of .01 mcg/L to 
> 94 mcg/L, 
> > > with results markedly skewed to the left (nearly half the results 
> > > are zero; 90% are < .20.  results are given in increments
> > > of .01).  Of course, zero is a censored value which represents a
> > > distribution of results between zero and somewhere below .01.
> > <snip.
> > > I found a method attributed to A.C. Cohen of doing 
> essentially this 
> > > which uses a lookup table to calculate the mean and standard 
> > > deviation of an assumed log-normal distribution based upon the 
> > > non-censored data and the proportion of data points that are 
> > > censored, but there must be a better way to do this in Stata.
> > > 
> > > Any thoughts on (1) whether it is reasonable to assume the 
> > > log-normal distribution (I've played with qlognorm and 
> plognorm, but
> > > it's hard to know what is good enough), and if so (2) how 
> to do it?  

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index