[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <[email protected]> |

To |
<[email protected]> |

Subject |
st: RE: data transformation - need help |

Date |
Fri, 15 Jan 2010 18:35:35 -0000 |

I see no context here. The presupposition seems to be that there is a statistically correct answer independent of what your overarching problem is and of what you are trying to do. I don't think there is. First off, I can imagine a situation in which I decide that although I have data on lengths (say), it makes as much or more sense to work with areas (lengths squared), and I also find that areas are more nearly normally distributed than are lengths. In that case, the answer is to stick with areas and work with and report differences between mean areas. It doesn't sound as if that is your kind of situation. What is best for you depends on whether you are really interested in means as such or are using the t-test as if it were an omnibus, factotum or portmanteau test of whether distributions differ. It isn't really designed for that purpose but it often is used that way. Also, precisely why are you transforming? Is it because you think that normality is a prerequisite for the t-test? The t-test can work quite well regardless of marginal distributions. It's not quite the same thing, but it's the same general issue that the indications of a test comparing means can be much the same across distribution shapes: . sysuse auto (1978 Automobile Data) . foreach power in -1 0 1 2 { 2. qui glm mpg foreign, link(power `power') 3. mat b = e(b) 4. mat var = vecdiag(e(V)) 5. di "power `power' {col 10}" %6.3f b[1,1] / sqrt(var[1,1]) 6. } power -1 -3.797 power 0 3.749 power 1 3.631 power 2 3.458 Here the change of sign when the response is viewed on reciprocal scale is expected as that scale reverses high and low as compared with the original data. Otherwise the z statistic summarising a comparison between means varies only slowly with distribution shape. Tony Lachenbruch would probably want me to quote George Box at this point: G. E. P. Box. 1953. Non-normality and tests on variances. Biometrika 40: 318-335. Another issue is that if squaring looks to be the best transformation, you evidently have left-skewed distributions. That is clearly possible but a little unusual. It often reflects the existence of an upper bound to the data -- and if so squaring is not necessarily a good idea after all. Nick [email protected] Andreea Cristiana Didilescu DDS, PhD Could anyone help me with some data transformations? I have two samples (n=45) with skewed distribution. After using ladder command, it came out that square transformation would be suitable for both of them. May I perform a ttest after performing data transformation? How should I deal with the results? Should I go back to the original data (e.g. mean difference)? What about the p-value? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: data transformation - need help***From:*Andreea Cristiana Didilescu <[email protected]>

- Prev by Date:
**Re: st: Stata 11 (MS Windows), selecting data source (e.g. MS Access file) during ODBC-import via menu** - Next by Date:
**st: Stata users' meeting London September 2010: call for papers** - Previous by thread:
**st: data transformation - need help** - Next by thread:
**st: Crossed wires? Table set wrong? (table command, rel 10)** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |