Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: data transformation - need help


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: data transformation - need help
Date   Fri, 15 Jan 2010 18:35:35 -0000

I see no context here. The presupposition seems to be that there is a
statistically correct answer independent of what your overarching
problem is and of what you are trying to do. I don't think there is. 

First off, I can imagine a situation in which I decide that although I
have data on lengths (say), it makes as much or more sense to work with
areas (lengths squared), and I also find that areas are more nearly
normally distributed than are lengths. In that case, the answer is to
stick with areas and work with and report differences between mean
areas. 

It doesn't sound as if that is your kind of situation.  

What is best for you depends on whether you are really interested in
means as such or are using the t-test as if it were an omnibus, factotum
or portmanteau test of whether distributions differ. It isn't really
designed for that purpose but it often is used that way. 

Also, precisely why are you transforming? Is it because you think that
normality is a prerequisite for the t-test? The t-test can work quite
well regardless of marginal distributions. It's not quite the same
thing, but it's the same general issue that the indications of a test
comparing means can be much the same across distribution shapes: 

. sysuse auto
(1978 Automobile Data)

. foreach power in -1 0 1 2 {
  2. qui glm mpg foreign, link(power `power')
  3. mat b = e(b)
  4. mat var = vecdiag(e(V))
  5. di "power `power' {col 10}" %6.3f b[1,1] / sqrt(var[1,1])
  6. }

power -1 -3.797
power 0   3.749
power 1   3.631
power 2   3.458

Here the change of sign when the response is viewed on reciprocal scale
is expected as that scale reverses high and low as compared with the
original data. Otherwise the z statistic summarising a comparison
between means varies only slowly with distribution shape. 

Tony Lachenbruch would probably want me to quote George Box at this
point:

G. E. P. Box. 1953. Non-normality and tests on variances. Biometrika 40:
318-335.

Another issue is that if squaring looks to be the best transformation,
you evidently have left-skewed distributions. That is clearly possible
but a little unusual. It often reflects the existence of an upper bound
to the data -- and if so squaring is not necessarily a good idea after
all. 

Nick 
[email protected] 

Andreea Cristiana Didilescu DDS, PhD

Could anyone help me with some data transformations? I have two
samples (n=45) with skewed distribution.  After using ladder command,
it came out that square transformation would be suitable for both of
them. May I perform a ttest after performing data transformation? How
should I deal with the results? Should I go back to the original data
(e.g. mean difference)? What about the p-value?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index