Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: how to deal with a censored and skewed regressor?


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   st: RE: how to deal with a censored and skewed regressor?
Date   Thu, 8 Jan 2009 18:48:39 -0500

There is a literature on censored regressors. A quick Google search on "censored regressors" turned up, for example:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1071239
http://web.mit.edu/tstoker/www/research.htm

The original poster has not yet responded to Al Feiveson, so we do not know whether the regressor "x", say, is "censored" in the technical sense (has unobserved values <0 that are coded as zero). If there are many zeros, perhaps x was generated by one of two processes: the first for whether x would be non-zero, the second for the value of x if it were non-zero.

One way to handle a mixture would be to generate two variables, an indicator that x is zero and, for non-zero x, the actual value.

x_zero = x ==0 & x<.
x_pos  = x*(x > 0 & x<.)   or xlog_pos = log(x)*(x > 0 & x<.)

Insert x_zero and either x_pos or xlog_pos into the predictor list. In fact, it is not necessary to choose between logged and unlogged versions; -fracpoly- could model the best transformation of x_pos.

The references above suggest that the indicator approach is biased if x is truly censored.

-Steve

On Jan 8, 2009, at 11:29 AM, Lachenbruch, Peter wrote:

Since the goal is to look at a logarithmic relationship, I'm wondering
if using glm with a log-link for a normal family wouldn't be helpful.
That way you don't need to worry about 0 values.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: Wednesday, January 07, 2009 11:54 AM
To: [email protected]
Subject: st: RE: RE: RE: RE: how to deal with a censored and skewed
regressor?

If there were other X-variables, one way (probably not the best) would
be to use multiple imputation. More generally, some sort of structural
model that relates Y to true X and includes the censoring mechanism
could be estimated (ha!). I suspect there are econometric models out
there that do this sort of thing - possibly even already programmed in
Stata.

AL F.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 1:06 PM
To: [email protected]
Subject: st: RE: RE: RE: how to deal with a censored and skewed
regressor?

And how would you do that? Other than knowing that c.i.s and P-values
are not as good as they seem, what difference does this knowledge make
to what you do?

Nick
[email protected]

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: 07 January 2009 18:27
To: [email protected]
Subject: st: RE: RE: how to deal with a censored and skewed regressor?

Nick wrote: "If you regard such a regressor as error-free, as one
usually does, then I am not clear that procedure need otherwise be
affected."

But if the variable (say X )is censored, then it's real value is unknown except for an upper or lower bound and there is error ,hence bias in the regression parameter estimates if X is used as is. So in mbaier's case,
if X is really censored at zero, that means it's true value is some
negative number. This needs to be taken into account in the estimation.

Al Feiveson



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 12:09 PM
To: [email protected]
Subject: st: RE: how to deal with a censored and skewed regressor?

(x - r(min)) / (r(max) - r(min))

does not yield missing when r(min) is 0 unless x is missing or r (max) is also zero. But that's neither here or there. The above is just a linear
rescaling of a variable and will thus leave skewness unchanged.

Skewness of a regressor is not itself fatal to anything.

Censoring of a regressor is something to take account of in
interpretation. If you regard such a regressor as error-free, as one
usually does, then I am not clear that procedure need otherwise be
affected.

Nick
[email protected]

mbaier

I tried to transform it according to ln(skewed variable), but my
regressor has a lots of values at zero, for which ln is not defined. I
also tried to create an index like I=100*(x-r(min))/(r(max)-r(min)),
which again leads to many missings (due to many x's being zero).
What can I do?
Besides, do I have to account for the censoring of my regressor? If so,
what can I do?
w.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index