[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Richard Harvey" <richardharvey2008@googlemail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: R: ttest and log transformation |

Date |
Sun, 28 Sep 2008 12:32:33 +0100 |

Hi , Carlo..thanks for your reply.My main problem is the skewness and small sample size. In the summary stats I posted the N is large as it is for the whole sample but when I analyse subsamples there are some every small samples. i.e less than 20. The bootstrap seems like a good idea. Can I do something as simple as bootstrap r(t) reps(1000) saving(c:\), ttest var, by(catvar) unpaired unequal or is it something more involved as below? bootstrap r(mean) if catvar=="cat1", reps(1000):sum var matrix mu_1=e(b) matrix sterrsq_1=e(V) bootstrap r(mean) if catvar=="cat2", reps(1000):sum var matrix mu_2=e(b) matrix sterrsq_2=e(V) scalar Z=((mu_1[1,1]- mu_2[1,1])/sqrt(sterrsq_1[1,1]+ sterrsq_2[1,1])) scalar p=(1-normal(abs(z)))*2 di "z-value: "[Z] di "p = "[p] thanks very much for your help regards rich 2008/9/27 Carlo Lazzaro <carlo.lazzaro@tiscalinet.it>: > > Dear Rich, > about your concerns about bad-behaved ttest, why don't try the following > steps: > > bootstrap your untransformed data; > take a look at the resulting sampling distribution; perform a bootstrap > ttest; calculate how many times the t_bootstrap is >= t_original and =< > t_original contrast the obtained bootstrap p_value with the original one > ---------------------------begin example----------------------------------- > set obs 100 > g A=10*(uniform()) > g B=15*(uniform()) > swilk A B // Prob>z_A=0.00030; Prob>z_B=0.00032 // Both A and B are not > normal ttest A == B, unpaired unequal //t = -5.6293 and Pr(|T| > |t|) = > 0.0000 return list scalar t=r(t) summarize A, mean replace A=A-r(mean) + > 6.198467 summarize B, mean replace B=B-r(mean) + 6.198467 sum A B bootstrap > r(t), reps(10000) saving(C:\Documents and > Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta, every(1) > replace)verbose : ttest A == B, unpaired unequal save "C:\Documents and > Settings\carlo\Documenti\Statistiche\Stata\Richard_preboot.dta", replace use > "C:\Documents and > Settings\carlo\Documenti\Statistiche\Stata\Richard_boot.dta", clear count if > _bs_1>=5.6293 //= 0 count if _bs_1<=-5.6293 //= 0 //bootstrap > p-value=(0+0)/10000=0 confirm the p-value calculated on the grounds of the > bad-behaved ttest. > ------------------------------end example----------------------------------- > > > About adding an arbitrary constraining or constant in the occurence ob log > transformed data, I would refer you to a debate on this list held at the end > of the last March and raised by a question on this topic. To sum up the > results of the abomentioned debate, the answer was negative. > > However, so called shifted log transformation (that is, adding a constant > before taking logs in order to make the retention of zeros in the data > feasible), are reported in the literature concerning health care programmes > cost comparison (please see, for a thorough review and many useful comments > on this issue Barber JA, Thompson SG. Analysis of cost data in randomized > trials: an application of the non-parametric bootstrap. Statist. Med. 2000; > 19:3219-3236). As usual, the main problem is in your way back (that is, in > back transforming from log in the original metric: that's a reason why I > prefer non-parametric bootstrap for analysing skewed cost data). > > HTH and Kind Regards. > > Enjoy your W-E, > > Carlo > -----Messaggio originale----- > Da: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Richard Harvey > Inviato: sabato 27 settembre 2008 10.15 > A: statalist@hsphsun2.harvard.edu > Oggetto: st: ttest and log transformation > > Hi all, > > I hope I can ask a fairly basic stats question. I have a variable that > i need to compare across two groups. > the summary stats for the variable NAN across the groups is as below. > The negative values are legitimate. > > group | N mean p50 max > min skewness kurtosis > > group1 | 2537 -77535 5278 19051350 > -46844688 -11.23 311.1 > group2 | 3031 -211373 4620 4609996 > -32617714 -11.18 185.6 > Total | 5568 -150391 4958 19051350 > -46844688 -11.33 278.4 > > If a do a ttest on the log transformed data, is it appropriate to add > an arbitrary constraint to make the negative values positive? Is the > ttest indeed any good for this data, or should I be looking at some > non parametric tests. > > to make the numbers more manageble is divide by 1000,000 and the > summary stats look like this > > group N mean p50 max > min skewness kurtosis > > group1 2537 -.07753 .005278 19.05 -46.84 > -11.23 311.1 > group2 3031 -.2114 .00462 4.61 > -32.62 -11.18 185.6 > Total 5568 -.1504 .004958 19.05 > -46.84 -11.33 278.4 > > Is it right to perform ttest on ln((NAN/1000000)+50) ? changing the > constant i add dosent seem to make a difference. > > stats on ln((NAN/100000)+50) is as below > > group N mean p50 max > min > skewness kurtosis > > group1 2537 4.604 4.605 4.78 3.973 > -17.21 527.4 > group2 3031 4.603 4.605 4.65 > 4.21 12.74 242.9 > Total 5568 4.604 4.605 4.78 3.973 > -15.94 469 > > There is still a large negative skewness coefficient. To me this > looks like not a situation for a ttest and I should be looking at > some non parametric test. Is that right? > > The results from the ttest using the unpaired and unequal option, > using the untransformed and using ln((NAN/100000)+50) are as below > > transformation t p 95% > CI > None 3.25 .0011 > 53205.45-214470.8 > log(50+var) 2.75 .0060 > .000367 - .002185 ( I understand this has to be back transformed) > > a ranksum test on the logtransformed NAN shows a z of 3.3999 with a p > of .0007.on the untransformed NAN it is 3.396 with p of .0007 > > so overall, there dosent seem to be any change in the conclusions, > what ever test I use. But is the ttest procedure appropriate? > > You help is much appreciated. > -- > thanks for your time > rich > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- thanks for your time rich * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: R: ttest and log transformation***From:*Carlo Lazzaro <carlo.lazzaro@tiscalinet.it>

- Prev by Date:
**st: Re: Propensity-score matching in STATA10: does not recognize pscore or psmatch2** - Next by Date:
**Re: Re: st: R: ttest and log transformation** - Previous by thread:
**st: R: ttest and log transformation** - Next by thread:
**Re: Re: st: R: ttest and log transformation** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |