[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Arnold Kester <arnold.kester@stat.unimaas.nl> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: how to deal with censoring at zero (a lot of zeroes) fora laboratory re |

Date |
Wed, 08 Jun 2005 15:08:46 +0200 |

Op 06/06/2005 01:17 PM schreef Daniel Waxman:

If you drop observations based on their value of a predictor variable you are in fact changing the protocol of your study. The inclusion criteria are changed to include "Troponin I is detectable". Results would be valid for people with detectable values only.Maarten, Kevin, Thank you very much for your replies. So for now I am just going give up trying to make distributional assumptions and to drop the half of the observations which are zero or non-detectable prior to log transforming the predictor and to creating the logistic model. In fact, whether I do this or change the zero to half of the lowest detectable value (i.e. .005) doesn't have much of an effect on the logistic odds ratio. If anybody has any objections to this (or sees how a statistical reviewer for a medical journal might have objections), please let me know.

If you want to get a prediction for undetectable Troponin without assuming a specific value you could add a dummy variable troponin_zero = (troponin == 0) and substitute (say) zero for log(troponin) when troponin==0. The predicted value from this model is independent of what you choose for "log(0)".

Arnold

Daniel

-----Original Message-----

From: owner-statalist@hsphsun2.harvard.edu

[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of maartenbuis

Sent: Sunday, June 05, 2005 7:38 PM

To: statalist@hsphsun2.harvard.edu

Subject: Re: st: how to deal with censoring at zero (a lot of zeroes) for a

laboratory re

I am tired: The cirtical assumption behind Multiple Imputation is that

the probability of missingness does not depend on the value of the

missing variable itself (Missing At Random, or MAR). This is obviously

not the case with censoring. My objection against (conditional) mean

imputation, and my remark about selecting on the independent variables

still hold. So, given that you have a large number of observations, I

would just ignore the zero observations.

Maarten

--- In statalist@yahoogroups.com, "maartenbuis" <maartenbuis@y...> wrote:

Hi Daniel, It looks to me like you could use -tobit- for log(tropin) and just a constant. The predicted values should give you the extrapolations you want. (This will be the same value for all missing observations: the mean of the log-normal distribution conditional on being less than the censoring value) However, These are actually missing values, and apperently you want to create imputations for them. If you just use the values you obtained from -predict- you will be assuming that you are as sure about these values as you are about the values you actually observed, and thus get standard errors that are too small. If you really want to impute, than you could have a look at -mice- (findit mice). Alternatively, you could use the results from -tobit- to generate multiple imputations. Mail me if you want to do that, and I can write, tonight or tomorrow, an example for the infamous auto dataset. However, censoring on the independent variable is generally much less a problem than censoring on the dependent variable, so ignoring (throwing away) the censored observation, should not lead to very different estimates. HTH, Maarten --- "Daniel Waxman" <dan@a...> wrote:I am modeling a laboratory test (Troponin I) as an independent (continuous) predictor of in-hospital mortality in a sample of 10,000 subjects. <snip> The problem is the zero values, what they represent, and what to do with them. The distribution of results ranges from the minimal detectable level of .01 mcg/L to 94 mcg/L, with results markedly skewed to the left (nearly half the results are zero; 90% are < .20. results are given in increments

of .01). Of course, zero is a censored value which represents a

distribution of results between zero and somewhere below .01.

<snip.I found a method attributed to A.C. Cohen of doing essentially this which uses a lookup table to calculate the mean and standard deviation of an assumed log-normal distribution based upon the non-censored data and the proportion of data points that are censored, but there must be a better way to do this in Stata.

Any thoughts on (1) whether it is reasonable to assume the log-normal distribution (I've played with qlognorm and plognorm, but

it's hard to know what is good enough), and if so (2) how to do it?* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

-- Met vriendelijke groet, Arnold Kester * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re***From:*"Daniel Waxman" <dan@amplecat.com>

**References**:**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re***From:*"Daniel Waxman" <dan@amplecat.com>

- Prev by Date:
**st: Re: Dummy Variable Trap** - Next by Date:
**st: dtobit and tobcm in Stata 9** - Previous by thread:
**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re** - Next by thread:
**RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |