[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Nick Cox <n.j.cox@stata.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: ttest and log transformation |

Date |
Mon, 29 Sep 2008 09:40:23 -0500 |

I want to back up and ask what you want to do, and what you think the t test would do for you.

It is a big jump from a very general question like

How do these distributions differ?

to a specific question like

Are the means of these distributions the same, or different?

or

How do the means of these distributions differ?

The second and third require at a minimum that means are useful for your data. t tests are not general answers to the first.

A useful direct way to compare two distributions is through -qqplot-. That can give you a direct signal on whether two distributions differ by an additive shift, which (with other stuff) lies behind the t-test, or by a multiplicative shift, which (with other stuff) lies behind a t-test on logged values, or, as I suspect, something much more complicated.

What best to do with data that can be + or -, are long-tailed and skew in one direction is not, it seems, often discussed, although it is exactly what I would expect with say company profit and loss data, which are hardly exotic. The help file -transint- on SSC has some discussion on a neglog transformation.

Nick

n.j.cox@durham.ac.uk

Richard Harvey wrote:

I hope I can ask a fairly basic stats question. I have a variable that i need to compare across two groups. the summary stats for the variable NAN across the groups is as below. The negative values are legitimate. group | N mean p50 max min skewness kurtosis group1 | 2537 -77535 5278 19051350 -46844688 -11.23 311.1 group2 | 3031 -211373 4620 4609996 -32617714 -11.18 185.6 Total | 5568 -150391 4958 19051350 -46844688 -11.33 278.4 If a do a ttest on the log transformed data, is it appropriate to add an arbitrary constraint to make the negative values positive? Is the ttest indeed any good for this data, or should I be looking at some non parametric tests. to make the numbers more manageble is divide by 1000,000 and the summary stats look like this group N mean p50 max min skewness kurtosis group1 2537 -.07753 .005278 19.05 -46.84 -11.23 311.1 group2 3031 -.2114 .00462 4.61 -32.62 -11.18 185.6 Total 5568 -.1504 .004958 19.05 -46.84 -11.33 278.4 Is it right to perform ttest on ln((NAN/1000000)+50) ? changing the constant i add dosent seem to make a difference. stats on ln((NAN/100000)+50) is as below group N mean p50 max min skewness kurtosis group1 2537 4.604 4.605 4.78 3.973 -17.21 527.4 group2 3031 4.603 4.605 4.65 4.21 12.74 242.9 Total 5568 4.604 4.605 4.78 3.973 -15.94 469 There is still a large negative skewness coefficient. To me this looks like not a situation for a ttest and I should be looking at some non parametric test. Is that right? The results from the ttest using the unpaired and unequal option, using the untransformed and using ln((NAN/100000)+50) are as below transformation t p 95% CI None 3.25 .0011 53205.45-214470.8 log(50+var) 2.75 .0060 .000367 - .002185 ( I understand this has to be back transformed) a ranksum test on the logtransformed NAN shows a z of 3.3999 with a p of .0007.on the untransformed NAN it is 3.396 with p of .0007 so overall, there dosent seem to be any change in the conclusions, what ever test I use. But is the ttest procedure appropriate? You help is much appreciated.

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: ttest and log transformation***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**References**:**st: ttest and log transformation***From:*"Richard Harvey" <richardharvey2008@googlemail.com>

- Prev by Date:
**Re: st: Trouble labeling x-axis and a question about Graphics Editor** - Next by Date:
**st: overlapping histograms** - Previous by thread:
**st: R: ttest and log transformation** - Next by thread:
**RE: st: ttest and log transformation** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |