[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Daniel Waxman" <dan@amplecat.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re |

Date |
Mon, 6 Jun 2005 21:26:30 -0400 |

Point well-taken about asking for partially-informed opinions, thanks. The cube root works, although the fit (i.e. predicted vs. observed outcomes) is not quite as good as with the log-transformed variable. Conceptually, I'm not sure that I understand the problem in throwing away the zeroes--essentially I would be saying that there is substantially more uncertainty at the zero point than at any measurable one, and therefore I want my model to predict outcome relative to the lowest measurable concentration rather than relative to the zero point. It does affect the numerical results of goodness-of-fit tests--is that the conceptual problem? But making a somewhat arbitrary choice for the zero point also has problems. Would it be cheating to just replace the zero with the best-fit as determined by solving the logit equation backwards? Probably... [please feel free to censor debate if this has gone back & forth too many times!] -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Monday, June 06, 2005 8:45 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re I am always queasy when expected to approve, or invited to disapprove, a proposed analysis. How can anyone give a really worthwhile opinion of what is sensible for someone's project on this amount of information? Nevertheless the notion of throwing away half the data on this basis is rather alarming. I have found cube roots often useful for non-negative variables. This is partly empirical, partly that zero goes to zero, but there is also an arm-waving basis that cube roots work well for gamma distributions (cf. the Wilson-Hilferty transformation). More generally, powers falling towards zero in effect have the logarithm as their limit. Nick n.j.cox@durham.ac.uk Daniel Waxman > Maarten, Kevin, > > Thank you very much for your replies. So for now I am just > going give up > trying to make distributional assumptions and to drop the half of the > observations which are zero or non-detectable prior to log > transforming the > predictor and to creating the logistic model. In fact, > whether I do this or > change the zero to half of the lowest detectable value (i.e. > .005) doesn't > have much of an effect on the logistic odds ratio. > > If anybody has any objections to this (or sees how a > statistical reviewer > for a medical journal might have objections), please let me know. > > Daniel > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of maartenbuis > Sent: Sunday, June 05, 2005 7:38 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: how to deal with censoring at zero (a lot of > zeroes) for a > laboratory re > > I am tired: The cirtical assumption behind Multiple Imputation is that > the probability of missingness does not depend on the value of the > missing variable itself (Missing At Random, or MAR). This is obviously > not the case with censoring. My objection against (conditional) mean > imputation, and my remark about selecting on the independent variables > still hold. So, given that you have a large number of observations, I > would just ignore the zero observations. > > Maarten > > --- In statalist@yahoogroups.com, "maartenbuis" > <maartenbuis@y...> wrote: > > Hi Daniel, > > > > It looks to me like you could use -tobit- for log(tropin) and just a > > constant. The predicted values should give you the > extrapolations you > > want. (This will be the same value for all missing observations: the > > mean of the log-normal distribution conditional on being > less than the > > censoring value) > > > > However, These are actually missing values, and apperently > you want to > > create imputations for them. If you just use the values you obtained > > from -predict- you will be assuming that you are as sure about these > > values as you are about the values you actually observed, > and thus get > > standard errors that are too small. If you really want to > impute, than > > you could have a look at -mice- (findit mice). Alternatively, you > > could use the results from -tobit- to generate multiple imputations. > > Mail me if you want to do that, and I can write, tonight or > tomorrow, > > an example for the infamous auto dataset. However, censoring on the > > independent variable is generally much less a problem than censoring > > on the dependent variable, so ignoring (throwing away) the censored > > observation, should not lead to very different estimates. > > > > HTH, > > Maarten > > > > --- "Daniel Waxman" <dan@a...> wrote: > > > I am modeling a laboratory test (Troponin I) as an independent > > > (continuous) predictor of in-hospital mortality in a sample of > > > 10,000 subjects. <snip> The problem is the zero values, > what they > > > represent, and what to do with them. The distribution > of results > > > ranges from the minimal detectable level of .01 mcg/L to > 94 mcg/L, > > > with results markedly skewed to the left (nearly half the results > > > are zero; 90% are < .20. results are given in increments > > > of .01). Of course, zero is a censored value which represents a > > > distribution of results between zero and somewhere below .01. > > <snip. > > > I found a method attributed to A.C. Cohen of doing > essentially this > > > which uses a lookup table to calculate the mean and standard > > > deviation of an assumed log-normal distribution based upon the > > > non-censored data and the proportion of data points that are > > > censored, but there must be a better way to do this in Stata. > > > > > > Any thoughts on (1) whether it is reasonable to assume the > > > log-normal distribution (I've played with qlognorm and > plognorm, but > > > it's hard to know what is good enough), and if so (2) how > to do it? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: proposed solution to the censored-at-zero problem, and question regarding out-of-sample predictions for 'mfp'***From:*"Daniel Waxman" <dan@amplecat.com>

**References**:**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: RE: Automatically changing -ylabel()- values using -graph-** - Next by Date:
**st: Infile Errors** - Previous by thread:
**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re** - Next by thread:
**st: proposed solution to the censored-at-zero problem, and question regarding out-of-sample predictions for 'mfp'** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |