[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: transformations for highly skewed dependent variable |

Date |
Thu, 10 Sep 2009 20:00:51 -0400 |

Michael Crain<michaelcrain@hotmail.com> : First, the original poster asked about transforming y to change its distribution, not the distribution of errors or residuals. If X is a set of highly skewed lognormal variables, and y=Xb+e, with some elements of b negative and some positive, it may well be the case that y has a high peak near zero and long tails, even with e distributed standard normal, e.g. clear drawnorm x1 x2 x3 x4 e, clear seed(1) n(1000) foreach v of var x* { replace `v'=exp(3*`v') } g y=x1+x2-x3-x4+e tw kdensity y, name(y) qui reg y x* predict r, res tw kdensity r, name(res) But you are asking about transforming y after a regression, where you have looked at the residuals, I think. This still presents problems in the general case. Note the transformation is done not to give normal errors, but more normal residuals. So y is transformed after fitting a model, and that model is not driven by strong theory (if it were, no transformation would be considered). This form of specification search can introduce bias, e.g. if the target is estimating the mean marginal effect of some X on y. There are many possible violations of normality for errors, and only a fraction call for transforming y. If the model is misspecified, or a regressor is omitted, it's natural to think residuals need not be normal even if the errors in the true model are. If errors are non-normal because they are a mixture of normals, then perhaps heteroskedasticity is the issue, and transforming y may not produce the desired result at all. Implicit in my code snippet in the post you quoted was some kind of categorical heteroskedasticity (thinking of firms of different sizes earning average returns with very different distributions from the same family). The question "Does the economics field look past some of the GLM assumptions?" seems to imply some slight on the economics field, as if econometricians ignore assumptions, when in fact they are if anything too focused on them. Or perhaps you had something else in mind altogether? And what is off topic? On Thu, Sep 10, 2009 at 7:17 PM, Michael Crain<michaelcrain@hotmail.com> wrote: > Well, sure, there are a lot of possible transformations e.g. > arctangent or cube root, but what is the purpose of the > transformation? Are you regressing y on X and thinking the errors > won't be normal? In that case, you may not want to transform y. > Also, have you considered that the y~=0 obs might be somehow > qualitatively different? Note that the sd of return should be > conditioned on size of investment, at least... This is a bit off topic. I believe you are suggesting that transforming variables to address non-normal errors is not so important in this case of an economic data set. Can you explain why? Does the economics field look past some of the GLM assumptions? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: transformations for highly skewed dependent variable***From:*Michael Crain <michaelcrain@hotmail.com>

- Prev by Date:
**Re: st: graph export - TIF -** - Next by Date:
**Re: st: graph export - TIF -** - Previous by thread:
**Re: st: transformations for highly skewed dependent variable** - Next by thread:
**FW: st: RE: RE: RE: RE: distribution curve with svy** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |