[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: swilk test Ho: |

Date |
Fri, 8 Aug 2008 15:16:25 +0100 |

Similar questions come up from time to time. I'll recycle some thoughts given previously. I agree strongly with Martin's bottom line. Often it appears that normality testing is just part of some statistical ritual, and that those participating have lost sight of exactly why they are doing it. But let's put such vague, impious thoughts aside, and look at some hard evidence. A salutary example is near to hand. . sysuse auto, clear . swilk price-foreign Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------- price | 74 0.76696 15.008 5.909 0.00000 mpg | 74 0.94821 3.335 2.627 0.00430 rep78 | 69 0.98191 1.100 0.208 0.41760 headroom | 74 0.98104 1.221 0.436 0.33137 trunk | 74 0.97921 1.339 0.637 0.26215 weight | 74 0.96110 2.505 2.003 0.02258 length | 74 0.97165 1.825 1.313 0.09461 turn | 74 0.97113 1.859 1.353 0.08803 displacement | 74 0.92542 4.803 3.423 0.00031 gear_ratio | 74 0.95814 2.696 2.163 0.01525 foreign | 74 0.96928 1.978 1.488 0.06838 Let's sort that so the structure is easier to see. price | 74 0.76696 15.008 5.909 0.00000 displacement | 74 0.92542 4.803 3.423 0.00031 mpg | 74 0.94821 3.335 2.627 0.00430 gear_ratio | 74 0.95814 2.696 2.163 0.01525 weight | 74 0.96110 2.505 2.003 0.02258 foreign | 74 0.96928 1.978 1.488 0.06838 turn | 74 0.97113 1.859 1.353 0.08803 length | 74 0.97165 1.825 1.313 0.09461 trunk | 74 0.97921 1.339 0.637 0.26215 headroom | 74 0.98104 1.221 0.436 0.33137 rep78 | 69 0.98191 1.100 0.208 0.41760 Stepping back, what is non-normality and why we should care about it? (For normal, read "Gaussian" or "central" if you prefer. The second was suggested by the physicist Edwin Jaynes.) Crudely, non-normality could include overall skewness, overall tail weight differing from normal, granularity, individual outliers, and whatever else I've forgotten. Shapiro-Wilk collapses all that onto one dimension by quantifying the straightness of a normal probability plot. But, crucially, you lose much information by any such numerical reduction. To the key point: How far is any column here an indicator of non-normality that you might care about (or normality that you might desire)? For example, -rep78- is at one extreme of the ranking, but -rep78- is an ordered categorical variable and in one sense is possibly not even appropriate for the test. It looks good because it happens to be unimodal, fairly symmetric and free of outliers. Even -foreign- passes muster, if you use P < 0.05 as a cutoff, even though it's a binary variable. But why is -foreign- assessed as more nearly normal than -gear_ratio-? It's, I guess, because it waggles less in the tails than -gear_ratio-. Yet I really can't imagine -gear_ratio- causing any problems as either response or predictor, even if there were some assumption of normality anywhere. On the other hand, -foreign- really should not be analysed as if it were normal! Naturally, some of the results here make perfect sense. On -swilk- (and for that matter on moment- and L-moment-based shape measures) -price- sticks out as distinctly skew and fat-tailed and probably best analysed on (say) a logarithmic scale. But the total picture is this. You can boost Shapiro-Wilk as much as you like as an omnibus or portmanteau statistic, but you can't guarantee that it will match what is acceptable to you or unacceptable to you. Practically, it can send a very misleading message. I haven't touched on various other issues. A key issue is what happens with different sample sizes. Naturally, I have no idea what sample sizes occur in Carlo's work. Perhaps even more important, tests for marginal normality are often not directly relevant for how a predictor or response behaves within some larger model. Nick n.j.cox@durham.ac.uk Martin Weiss Well, your H0 is correct. The interpretation of test results is more intricate, though. Non-rejection of the null does not imply that the data are normally distributed; it does mean that you do not find convincing evidence against the assertion that they derive from a normal distribution. Note that the 95% confidence level that you are implying in your post means that you will falsely reject the null in 5% of your tests. The information that tests such as -swilk- provide is less than most users imagine... Carlo Georges In using the shapiro wilk test for testing normality, is it correct that the H0 (NULL hypothsis) is :H0 data are normally distributed, so when p< 0,05 we reject Ho and data are not normally distributed. Conversely if p> 0,05 data are normally distributed. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: RE: swilk test Ho:***From:*"Carlo Georges" <georgesc@pt.lu>

**References**:**st: How to 'predict' residual by region? Seasonal adjustment?***From:*Galina An <ang@kenyon.edu>

**st: swilk test Ho:***From:*"Carlo Georges" <georgesc@pt.lu>

**st: RE: swilk test Ho:***From:*"Martin Weiss" <martin.weiss@uni-tuebingen.de>

- Prev by Date:
**st: RE: Matching Names** - Next by Date:
**Re: st: RE: Linear Regression** - Previous by thread:
**st: RE: swilk test Ho:** - Next by thread:
**st: RE: RE: RE: swilk test Ho:** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |