[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Carlo Lazzaro" <carlo.lazzaro@tin.it> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: R: ttest and log transformation |

Date |
Sat, 27 Sep 2008 14:51:05 +0200 |

DDear Rich, about your concerns about bad-behaved ttest, why don't try the following steps: bootstrap your untransformed data; take a look at the resulting sampling distribution; perform a bootstrap ttest; calculate how many times the t_bootstrap is >= t_original and =< t_original contrast the obtained bootstrap p_value with the original one ---------------------------begin example----------------------------------- set obs 100 g A=10*(uniform()) g B=15*(uniform()) swilk A B // Prob>z_A=0.00030; Prob>z_B=0.00032 // Both A and B are not normal ttest A == B, unpaired unequal //t = -5.6293 and Pr(|T| > |t|) = 0.0000 return list scalar t=r(t) summarize A, mean replace A=A-r(mean) + 6.198467 summarize B, mean replace B=B-r(mean) + 6.198467 sum A B bootstrap r(t), reps(10000) saving(C:\Documents and Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta, every(1) replace)verbose : ttest A == B, unpaired unequal save "C:\Documents and Settings\carlo\Documenti\Statistiche\Stata\Richard_preboot.dta", replace use "C:\Documents and Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta", clear count if _bs_1>=5.6293 //= 0 count if _bs_1<=-5.6293 //= 0 //bootstrap p-value=(0+0)/10000=0 confirm the p-value calculated on the grounds of the bad-behaved ttest. ------------------------------end example----------------------------------- About adding an arbitrary constraining or constant in the occurence ob log transformed data, I would refer you to a debate on this list held at the end of the last March and raised by a question on this topic. To sum up the results of the abomentioned debate, the answer was negative. However, so called shifted log transformation (that is, adding a constant before taking logs in order to make the retention of zeros in the data feasible), are reported in the literature concerning health care programmes cost comparison (please see, for a thorough review and many useful comments on this issue Barber JA, Thompson SG. Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. Statist. Med. 2000; 19:3219-3236). As usual, the main problem is in your way back (that is, in back transforming from log in the original metric: that's a reason why I prefer non-parametric bootstrap for analysing skewed cost data). HTH and Kind Regards. Enjoy your W-E, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Richard Harvey Inviato: sabato 27 settembre 2008 10.15 A: statalist@hsphsun2.harvard.edu Oggetto: st: ttest and log transformation Hi all, I hope I can ask a fairly basic stats question. I have a variable that i need to compare across two groups. the summary stats for the variable NAN across the groups is as below. The negative values are legitimate. group | N mean p50 max min skewness kurtosis group1 | 2537 -77535 5278 19051350 -46844688 -11.23 311.1 group2 | 3031 -211373 4620 4609996 -32617714 -11.18 185.6 Total | 5568 -150391 4958 19051350 -46844688 -11.33 278.4 If a do a ttest on the log transformed data, is it appropriate to add an arbitrary constraint to make the negative values positive? Is the ttest indeed any good for this data, or should I be looking at some non parametric tests. to make the numbers more manageble is divide by 1000,000 and the summary stats look like this group N mean p50 max min skewness kurtosis group1 2537 -.07753 .005278 19.05 -46.84 -11.23 311.1 group2 3031 -.2114 .00462 4.61 -32.62 -11.18 185.6 Total 5568 -.1504 .004958 19.05 -46.84 -11.33 278.4 Is it right to perform ttest on ln((NAN/1000000)+50) ? changing the constant i add dosent seem to make a difference. stats on ln((NAN/100000)+50) is as below group N mean p50 max min skewness kurtosis group1 2537 4.604 4.605 4.78 3.973 -17.21 527.4 group2 3031 4.603 4.605 4.65 4.21 12.74 242.9 Total 5568 4.604 4.605 4.78 3.973 -15.94 469 There is still a large negative skewness coefficient. To me this looks like not a situation for a ttest and I should be looking at some non parametric test. Is that right? The results from the ttest using the unpaired and unequal option, using the untransformed and using ln((NAN/100000)+50) are as below transformation t p 95% CI None 3.25 .0011 53205.45-214470.8 log(50+var) 2.75 .0060 .000367 - .002185 ( I understand this has to be back transformed) a ranksum test on the logtransformed NAN shows a z of 3.3999 with a p of .0007.on the untransformed NAN it is 3.396 with p of .0007 so overall, there dosent seem to be any change in the conclusions, what ever test I use. But is the ttest procedure appropriate? You help is much appreciated. -- thanks for your time rich * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: ttest and log transformation***From:*"Richard Harvey" <richardharvey2008@googlemail.com>

- Prev by Date:
**Re: st: benchmarking + determine the peers** - Next by Date:
**Re: st: Fraud methods in Stata** - Previous by thread:
**st: ttest and log transformation** - Next by thread:
**Re: st: ttest and log transformation** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |