Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory result which I would like to log transform


From   "Daniel Waxman" <dan@amplecat.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: how to deal with censoring at zero (a lot of zeroes) for a laboratory result which I would like to log transform
Date   Sun, 5 Jun 2005 10:10:57 -0400

Svend,

Thanks.  I did indeed look extensively as the predictor as a categorical
variable and as a predictor when .005 is used.  My dataset is large enough,
and events common enough, that the confidence intervals are quite small at
the .01 level.  There is a threshold, but it is below .01.  In other words,
there is no measurable change in outcome between .01 and .02, but there is
one between 'undetectable' and .01.  

Zero could be .005, but it could be .0005 or .00005.  (biologically speaking
as well) I suppose this becomes irrelevant very soon though if it can't be
measured.  However, the logistic equation suggests (given the measured # of
deaths at the zero value) that the zero should be approximately .001.  

It seems that this is a common issue in the environmental literature, where
people care a lot about very small concentrations of things (lead, arsenic,
etc.)  I have found various sources that suggest that the method of Cohen
(mentioned below) of estimating the entire distribution curve by using the
available points and the known or assumed shape can be preferable to picking
half of the lower limit arbitrarily.

Daniel

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Svend Juul
Sent: Sunday, June 05, 2005 9:47 AM
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: how to deal with censoring at zero (a lot of zeroes) for a
laboratory result which I would like to log transform

Daniel,
 
You wonder how to handle zero values in a predictor you have 
good reasons to log-transform.
 
For a first look I would make a reasonable categorization of the 
predictor, e.g. five categories (0, 0.01-0.09, 0.10-0.99, 1-10, 10+) 
and use -xi: logistic- to see the pattern. This analysis might also 
give an idea whether there is some threshold. 
 
If this justifies using a log-transform, I think you almost give
the answer yourself: zero means a result somewhere between 0 and
0.01. So why not select 0.005, log-transform, and run -logistic-
using the log-transformed predictor.
 
The idea to let the data determine the "best" value that the zeros
represent has its problems: The confidence interval for the odds
ratio estimate becomes too small.
 
Hope this helps
 
Svend



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index