[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: winsorization and normality |

Date |
Wed, 23 Jun 2004 00:33:13 +0100 |

Dear Gary: By accident or design you reply to my reply, but you don't focus on the kind of issue it raises. As I understand it, you can reduce your problem of non-normality by attacking the parts of the data you find least convenient and changing them! The ancient myth of the hotelier Procrustes who chopped and stretched his unfortunate guests to fit the beds on offer springs to mind. What's uppermost here, jumping through hoops to attain respectable P-values, or trying to promote statistical science? Put in more conventional and less histrionic terms, what precisely is the non-normality "problem" you have? A simple example, nothing to do with residuals or time series, but illustrative of the key difficulty, is provided by the auto data. If you go foreach v of var price-gear { swilk `v' } you will see that various variables qualify as non-normal according to conventional significance levels. But this means mostly that the sample size is large enough to detect some non-normality, not that the non-normality is large enough to be problematic for any purpose of data analysis. (In other words, the results exemplify a standard limitation of significance tests.) In fact, to pick up one example, a careful look at -gear-ratio- by e.g. qnorm gear_ratio shows that despite the P-value of 0.01525 this variable has a distribution which in practice would not be problematic if it were a distribution of residuals. (The P-value I put down partly to some granularity, certainly not outliers or fat tails.) And the n = 74 of the auto data is pretty modest by most people's standards: the issue will be compounded in larger datasets. My guess is that with your kind of data you have a much larger n. Incidentally, chopping according to a multiple of the SD is not Winsorization, as I pointed out on Sunday in reply to a previous posting of yours. More importantly, replacing a distribution longer-tailed than normal with one shorter-tailed than normal may well lead to rejections of normality too, depending precisely on what test you are using... Nick n.j.cox@durham.ac.uk gary tian > > Further to John's question regarding trimming, I would like > to raise the > following question to seek your help. > I and testing cointegration and causality for daily return of > share indices > time series (first log difference) data based on VAR model. > whatever I put > different lag of each variable, I found there is still > non-normality exist > in the time series by residual test. I applied sort of > winsorization in > which the returns are winsorized by replacing all returns > outside the range > [mean +/- standard deviations] with these boundary values. > the problems of > non-normality has been largely improved but still existed. the Second > method, I found it is more effective is using monthly and > quarterly data, > the problem is losing the original meaning of integration in > precise number > of days. Are these standard ways to treat the problem, or is > there any other > better way? Nick Cox > I guess there's a literature on this somewhere, > but it doesn't seem that trimming of tails > before regression ever caught on as standard practice > (unless there's a subdiscipline that does it all the > time, as a living refutation of this guess). > > The key question to me is what is your underlying > problem? Worrying about long tails is often > best met by quantile or robust regression or using > transformations or non-identity link functions. > Far simpler and better supported than tinkering > with the tails... > Rijo John > > I have a data set with quite a few outliers. Suppose I am > trimming my > > dependent variable 1% each from top and bottom using 1st and 99th > > percentiles. And I have the regression estimates before and after > > trimming. Let us also suppose that some of the variables that were > > significant before trimming turned out to be insignificant > > after trimming > > and/or viceversa. > > > > Is there a standard way by which one can decide how much percentage > > of data should be trimmed? Is a chow test for the equality of > > coefficients > > enough for this? I mean trim upto the point where the changes in > > coefficients becomes insignificant? Or is there any other > > standard way to > > do this? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: winsorization and normality** - Next by Date:
**st: Panels and ARMA(1,1)** - Previous by thread:
**st: RE: egen and computing fixed effects** - Next by thread:
**st: Panels and ARMA(1,1)** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |