Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re


From   "Daniel Waxman" <dan@amplecat.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory re
Date   Wed, 8 Jun 2005 11:35:29 -0400

Thank you.
I've discovered that the 'mfp' program (multivariable fractional
polynomials) has a convenient 'zerocat' option, which basically automates
the process of converting the zeroes to a separate binary predictor before
fitting the model.  Very useful!

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Arnold Kester
Sent: Wednesday, June 08, 2005 9:09 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: how to deal with censoring at zero (a lot of zeroes) for a
laboratory re


Op 06/06/2005 01:17 PM schreef Daniel Waxman:
> Maarten, Kevin,
> 
> Thank you very much for your replies.  So for now I am just going give up
> trying to make distributional assumptions and to drop the half of the
> observations which are zero or non-detectable prior to log transforming
the
> predictor and to creating the logistic model.  In fact, whether I do this
or
> change the zero to half of the lowest detectable value (i.e. .005) doesn't
> have much of an effect on the logistic odds ratio.
> 
> If anybody has any objections to this (or sees how a statistical reviewer
> for a medical journal might have objections), please let me know.

If you drop observations based on their value of a predictor variable 
you are in fact changing the protocol of your study. The inclusion 
criteria are changed to include "Troponin I is detectable". Results 
would be valid for people with detectable values only.

If you want to get a prediction for undetectable Troponin without 
assuming a specific value you could add a dummy variable troponin_zero = 
(troponin == 0) and substitute (say) zero for log(troponin) when 
troponin==0. The predicted value from this model is independent of what 
you choose for "log(0)".

Arnold

> 
> Daniel
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of maartenbuis
> Sent: Sunday, June 05, 2005 7:38 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: how to deal with censoring at zero (a lot of zeroes) for
a
> laboratory re
> 
> I am tired: The cirtical assumption behind Multiple Imputation is that
> the probability of missingness does not depend on the value of the
> missing variable itself (Missing At Random, or MAR). This is obviously
> not the case with censoring. My objection against (conditional) mean
> imputation, and my remark about selecting on the independent variables
> still hold. So, given that you have a large number of observations, I
> would just ignore the zero observations.
> 
> Maarten 
> 
> --- In statalist@yahoogroups.com, "maartenbuis" <maartenbuis@y...> wrote:
> 
>>Hi Daniel,
>>
>>It looks to me like you could use -tobit- for log(tropin) and just a
>>constant. The predicted values should give you the extrapolations you
>>want. (This will be the same value for all missing observations: the
>>mean of the log-normal distribution conditional on being less than the
>>censoring value)
>>
>>However, These are actually missing values, and apperently you want to
>>create imputations for them. If you just use the values you obtained
>>from -predict- you will be assuming that you are as sure about these
>>values as you are about the values you actually observed, and thus get
>>standard errors that are too small. If you really want to impute, than
>>you could have a look at -mice- (findit mice). Alternatively, you
>>could use the results from -tobit- to generate multiple imputations.
>>Mail me if you want to do that, and I can write, tonight or tomorrow,
>>an example for the infamous auto dataset. However, censoring on the
>>independent variable is generally much less a problem than censoring
>>on the dependent variable, so ignoring (throwing away) the censored
>>observation, should not lead to very different estimates.
>>
>>HTH,
>>Maarten
>>
>>--- "Daniel Waxman" <dan@a...> wrote:
>>
>>>I am modeling a laboratory test (Troponin I) as an independent 
>>>(continuous) predictor of in-hospital mortality in a sample of 
>>>10,000 subjects.  <snip> The problem is the zero values, what they 
>>>represent, and what to do with them.   The distribution of results 
>>>ranges from the minimal detectable level of .01 mcg/L to 94 mcg/L, 
>>>with results markedly skewed to the left (nearly half the results 
>>>are zero; 90% are < .20.  results are given in increments
>>>of .01).  Of course, zero is a censored value which represents a
>>>distribution of results between zero and somewhere below .01.
>>
>><snip.
>>
>>>I found a method attributed to A.C. Cohen of doing essentially this 
>>>which uses a lookup table to calculate the mean and standard 
>>>deviation of an assumed log-normal distribution based upon the 
>>>non-censored data and the proportion of data points that are 
>>>censored, but there must be a better way to do this in Stata.
>>>
>>>Any thoughts on (1) whether it is reasonable to assume the 
>>>log-normal distribution (I've played with qlognorm and plognorm, but
>>>it's hard to know what is good enough), and if so (2) how to do it?  
>>
>>
>>
>>
>>*
>>*   For searches and help try:
>>*   http://www.stata.com/support/faqs/res/findit.html
>>*   http://www.stata.com/support/statalist/faq
>>*   http://www.ats.ucla.edu/stat/stata/
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Met vriendelijke groet,
Arnold Kester

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index