[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Martin Weiss" <martin.weiss@uni-tuebingen.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: RE: swilk test Ho: |

Date |
Sun, 10 Aug 2008 16:31:22 +0200 |

Quite apart from the interpretation of test results and the usefulness of tests for normality, which have been comprehensively dealt with in this thread, I think Carlo has a good case when he complains that the -swilk- test does not state its null. Knowledge of the null is the key to any interpretation of a test result, and so I would argue that Stata should always state the H0 it is evaluating. The other two normality tests (-sktest- and -sfrancia-) do not give their null, either. It is perfectly arguable that the null of normality is obvious in these cases, but that need not be the case for all tests. I have had my fair share of problems with tests in -reg postestimation- such as -estat imtest-. For the newbie it is not always straightforward to know what is being tested here. On the other hand, -estat hettest- and -estat ovtest- make their nulls obvious. -estat szroeter- even goes to greater lengths and also states its Ha... Best, Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Carlo Georges Sent: Friday, August 08, 2008 5:44 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: RE: swilk test Ho: Thank you for the prompt and detailed reply. I agree it doesn't make too much sense runnibg this test over a variable or a series of variables without a prior hindsight. I already checked the data "visually", but what stunned me in this case the log-normal plot looked really good (as confirmed by swilk test p=0,90 BUT the histogram was not very convincing, therfore i needed more formal estimation of normality, the swilk is a useful tool in my case to confirm normality, my version of STATa,9 does not print the H0.hypothesis. Carlo Georges,DVM -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Nick Cox Sent: Freitag, 8. August 2008 16:16 To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: swilk test Ho: Similar questions come up from time to time. I'll recycle some thoughts given previously. I agree strongly with Martin's bottom line. Often it appears that normality testing is just part of some statistical ritual, and that those participating have lost sight of exactly why they are doing it. But let's put such vague, impious thoughts aside, and look at some hard evidence. A salutary example is near to hand. . sysuse auto, clear . swilk price-foreign Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------- price | 74 0.76696 15.008 5.909 0.00000 mpg | 74 0.94821 3.335 2.627 0.00430 rep78 | 69 0.98191 1.100 0.208 0.41760 headroom | 74 0.98104 1.221 0.436 0.33137 trunk | 74 0.97921 1.339 0.637 0.26215 weight | 74 0.96110 2.505 2.003 0.02258 length | 74 0.97165 1.825 1.313 0.09461 turn | 74 0.97113 1.859 1.353 0.08803 displacement | 74 0.92542 4.803 3.423 0.00031 gear_ratio | 74 0.95814 2.696 2.163 0.01525 foreign | 74 0.96928 1.978 1.488 0.06838 Let's sort that so the structure is easier to see. price | 74 0.76696 15.008 5.909 0.00000 displacement | 74 0.92542 4.803 3.423 0.00031 mpg | 74 0.94821 3.335 2.627 0.00430 gear_ratio | 74 0.95814 2.696 2.163 0.01525 weight | 74 0.96110 2.505 2.003 0.02258 foreign | 74 0.96928 1.978 1.488 0.06838 turn | 74 0.97113 1.859 1.353 0.08803 length | 74 0.97165 1.825 1.313 0.09461 trunk | 74 0.97921 1.339 0.637 0.26215 headroom | 74 0.98104 1.221 0.436 0.33137 rep78 | 69 0.98191 1.100 0.208 0.41760 Stepping back, what is non-normality and why we should care about it? (For normal, read "Gaussian" or "central" if you prefer. The second was suggested by the physicist Edwin Jaynes.) Crudely, non-normality could include overall skewness, overall tail weight differing from normal, granularity, individual outliers, and whatever else I've forgotten. Shapiro-Wilk collapses all that onto one dimension by quantifying the straightness of a normal probability plot. But, crucially, you lose much information by any such numerical reduction. To the key point: How far is any column here an indicator of non-normality that you might care about (or normality that you might desire)? For example, -rep78- is at one extreme of the ranking, but -rep78- is an ordered categorical variable and in one sense is possibly not even appropriate for the test. It looks good because it happens to be unimodal, fairly symmetric and free of outliers. Even -foreign- passes muster, if you use P < 0.05 as a cutoff, even though it's a binary variable. But why is -foreign- assessed as more nearly normal than -gear_ratio-? It's, I guess, because it waggles less in the tails than -gear_ratio-. Yet I really can't imagine -gear_ratio- causing any problems as either response or predictor, even if there were some assumption of normality anywhere. On the other hand, -foreign- really should not be analysed as if it were normal! Naturally, some of the results here make perfect sense. On -swilk- (and for that matter on moment- and L-moment-based shape measures) -price- sticks out as distinctly skew and fat-tailed and probably best analysed on (say) a logarithmic scale. But the total picture is this. You can boost Shapiro-Wilk as much as you like as an omnibus or portmanteau statistic, but you can't guarantee that it will match what is acceptable to you or unacceptable to you. Practically, it can send a very misleading message. I haven't touched on various other issues. A key issue is what happens with different sample sizes. Naturally, I have no idea what sample sizes occur in Carlo's work. Perhaps even more important, tests for marginal normality are often not directly relevant for how a predictor or response behaves within some larger model. Nick n.j.cox@durham.ac.uk Martin Weiss Well, your H0 is correct. The interpretation of test results is more intricate, though. Non-rejection of the null does not imply that the data are normally distributed; it does mean that you do not find convincing evidence against the assertion that they derive from a normal distribution. Note that the 95% confidence level that you are implying in your post means that you will falsely reject the null in 5% of your tests. The information that tests such as -swilk- provide is less than most users imagine... Carlo Georges In using the shapiro wilk test for testing normality, is it correct that the H0 (NULL hypothsis) is :H0 data are normally distributed, so when p< 0,05 we reject Ho and data are not normally distributed. Conversely if p> 0,05 data are normally distributed. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: RE: swilk test Ho:***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**st: RE: RE: RE: swilk test Ho:***From:*"Carlo Georges" <georgesc@pt.lu>

- Prev by Date:
**st: RE: RE: Problem with outreg2 in a do file** - Next by Date:
**st: Linear regression; where is the correlation coefficient** - Previous by thread:
**st: RE: RE: RE: swilk test Ho:** - Next by thread:
**st: Linear regression; where is the correlation coefficient** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |