Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Log Normality of Dependentvar


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Log Normality of Dependentvar
Date   Mon, 8 Jun 2009 18:22:02 +0100

In addition to Steve Samuels' various comments, I add a prejudice here against ln(X - k) as a transform unless k is specifiable in advance and on independent scientific or practical grounds. 

This prejudice has several bases, varying from solid to ectoplasmic: 

1. k is a lower limit to X and limits are always difficult to estimate from data. This is especially well documented with maximum likelihood which is one method of choice. 

2. In effect, you are saying once you entertain k != 0: the two-parameter lognormal is not good enough; let's consider a three-parameter lognormal. I'd rather try other two-parameter distributions first, or equivalently  other transformations. For example, fitting a two-parameter gamma is roughly equivalent in some senses to working on a cube root scale. 

3. If you play with this approach, you get all sorts of ad hoc constants floating round in your analysis. It then gets rather difficult to discuss, to compare with other studies, etc. 

Nick 
n.j.cox@durham.ac.uk 

Christian Weiss

thank you a lot for your elaboration on this topic! Although this was
very interesting for me, my actual question is still not answered yet.

So let me rephrase: If a variable is lognormally distributed
(according to swilk, lnnormal), why is it not "normally" distributed
after transforming it via ln / skskew0 / bcskew0 (according to swilk)


On Mon, Jun 8, 2009 at 12:18 PM, Maarten buis<maartenbuis@yahoo.co.uk> wrote:
>
> --- On Mon, 8/6/09, Christian Weiss wrote:
>> testing my dependent var via swilk or sfrancia rejects the
>> Null Hypothesis of Normality.
>
> This is problematic for a number of reasons:
>
> 1) Regression never assumes that the dependent variable is
> normally distributed, except when you have no explanatory
> variables. It only assumes that the residuals are normally
> distributed.
>
> 2) Testing for the normality of the residuals should only
> be done once you are confinced that the other assumptions
> have been met, as violations of the other assumptions are
> likely to lead to residuals that look non-normal
>
> 3) The normality of the residuals is probably the least
> important of the regression assumptions, as regression
> is reasonably robust to violations of it.
>
> 4) Tests are probably not the best way to assess whether
> the errors are normaly distributed. Graphical inspection
> is usually more informative and powerful, see:
> -help diagnostic plots- and -ssc d hangroot- for tools
> to help with that.
>
> For a more general set of tools to perform post-estimation
> checks of  regression assumptions see:
> -help regress postestimation-.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index