Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: how to deal with a censored and skewed regressor?


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: RE: how to deal with a censored and skewed regressor?
Date   Fri, 9 Jan 2009 12:04:40 -0500

Thinking about the problem some more: The two-variable indicator approach I first outlined *may* provide extra information. Now x_zero is an indicator for x <= 0. Say the model for x positive shows a positive trend with the mean response. Extrapolate that model to zero and compare with the prediction for the group with x<=0.

Obviously any educated guesses you can make about the range of negative values will be helpful. The Rigobon-Stoker reference carries this approach further by trying to predicting the values of the censored observations from other predictors.

-Steve

On Jan 9, 2009, at 9:59 AM, Steven Samuels wrote:

Melanie, with your "bound censoring", you are best off using only uncensored cases. See the reference below. With the assumption of exogenous censoring (E(error) = 0, conditional on censoring indicator, x, and other predictors), a complete case analysis will produce consistent estimates, but will lose efficiency because the sample size is lower. The reference illustrates a way of recovering information from the censored cases, but it has some strong assumptions. Entering the censored observations at their recorded value is not recommended; it will lead to "expansion bias", meaning that the estimated coefficient for x will be too big in absolute value.

Could you tell us something about the data which gave rise to so much censoring?

Good luck!

-Steve

http://web.mit.edu/tstoker/www/Rigobon_Stoker_IER_June_07.pdf

On Jan 9, 2009, at 4:55 AM, mbaier wrote:

Dear listers,
Thank you for your comments and help. Yes indeed, my regressor is
"censored" in a way that it has unobserved values <0 coded as zero.
Sorry if I didn't make it clear.

melanie b.

Steven Samuels schrieb:
There is a literature on censored regressors. A quick Google search on
"censored regressors" turned up, for example:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1071239
http://web.mit.edu/tstoker/www/research.htm

The original poster has not yet responded to Al Feiveson, so we do not
know whether the regressor "x", say, is "censored" in the technical
sense (has unobserved values <0 that are coded as zero). If there are many zeros, perhaps x was generated by one of two processes: the first
for whether x would be non-zero, the second for the value of x if it
were non-zero.

One way to handle a mixture would be to generate two variables, an
indicator that x is zero and, for non-zero x, the actual value.

x_zero = x ==0 & x<.
x_pos  = x*(x > 0 & x<.)   or xlog_pos = log(x)*(x > 0 & x<.)

Insert x_zero and either x_pos or xlog_pos into the predictor list. In
fact, it is not necessary to choose between logged and unlogged
versions; -fracpoly-  could model the best transformation of x_pos.

The references above suggest that the indicator approach is biased if x
is truly censored.

-Steve

On Jan 8, 2009, at 11:29 AM, Lachenbruch, Peter wrote:

Since the goal is to look at a logarithmic relationship, I'm wondering if using glm with a log-link for a normal family wouldn't be helpful.
That way you don't need to worry about 0 values.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: Wednesday, January 07, 2009 11:54 AM
To: [email protected]
Subject: st: RE: RE: RE: RE: how to deal with a censored and skewed
regressor?

If there were other X-variables, one way (probably not the best) would be to use multiple imputation. More generally, some sort of structural
model that relates Y to true X and includes the censoring mechanism
could be estimated (ha!). I suspect there are econometric models out there that do this sort of thing - possibly even already programmed in
Stata.

AL F.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 1:06 PM
To: [email protected]
Subject: st: RE: RE: RE: how to deal with a censored and skewed
regressor?

And how would you do that? Other than knowing that c.i.s and P- values are not as good as they seem, what difference does this knowledge make
to what you do?

Nick
[email protected]

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: 07 January 2009 18:27
To: [email protected]
Subject: st: RE: RE: how to deal with a censored and skewed regressor?

Nick wrote: "If you regard such a regressor as error-free, as one
usually does, then I am not clear that procedure need otherwise be
affected."

But if the variable (say X )is censored, then it's real value is unknown except for an upper or lower bound and there is error ,hence bias in the regression parameter estimates if X is used as is. So in mbaier's case,
if X is really censored at zero, that means it's true value is some
negative number. This needs to be taken into account in the estimation.

Al Feiveson



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, January 07, 2009 12:09 PM
To: [email protected]
Subject: st: RE: how to deal with a censored and skewed regressor?

(x - r(min)) / (r(max) - r(min))

does not yield missing when r(min) is 0 unless x is missing or r (max) is also zero. But that's neither here or there. The above is just a linear
rescaling of a variable and will thus leave skewness unchanged.

Skewness of a regressor is not itself fatal to anything.

Censoring of a regressor is something to take account of in
interpretation. If you regard such a regressor as error-free, as one
usually does, then I am not clear that procedure need otherwise be
affected.

Nick
[email protected]

mbaier

I tried to transform it according to ln(skewed variable), but my
regressor has a lots of values at zero, for which ln is not defined. I also tried to create an index like I=100*(x-r(min))/(r(max)-r (min)),
which again leads to many missings (due to many x's being zero).
What can I do?
Besides, do I have to account for the censoring of my regressor? If so,
what can I do?
w.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
Dipl.-Volkswirtin
Melanie Baier

C-LAB
Business Development
Fuerstenallee 11
33102 Paderborn, Germany

Phone: +49 5251 60 61 35
Fax: +49 5251 60 60 66
URL: www.c-lab.de


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index