Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maria Niarchou <m.niarchou@hotmail.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Tukey's HSD test from summary statistics |

Date |
Wed, 8 Feb 2012 23:30:50 +0200 |

Dear Jeff, This was very helpful. Thank you very much for your assistance. Best wishes, Maria > From: jpitblado@stata.com > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Tukey's HSD test from summary statistics > Date: Wed, 8 Feb 2012 13:46:23 -0600 > > Maria Niarchou <m.niarchou@hotmail.com> asks > > > Is there a way to calculate Tukey's HSD test in Stata when only sample > > sizes, means and standard deviations are available? > > The short answer is: Yes. > > New in Stata 12 are the functions -tukeyprob()- and -invtukeyprob()- that > compute cumulative probabilities and quantiles from Tukey's studentized range > distribution. > > ----------------------------------------------------------------------------- > > Here is the longer answer with some formulas, followed by an example. > > Suppose we have k means to compare, where mean m_i and standard deviation s_i > were computed from group i having sample size n_i. > > Our first problem is to determine how to estimate the standard error of a > given difference, say > > SE(m_1-m_2) = ? > > Assuming a common variance between the k groups, we can pool the sample > variance estimates to get > > MSE = (1/df) sum_i (n_i-1)*s_i^2 > > where > > df = sum_i (n_i - 1) > > So the HSD test statistic, assuming equal variances, becomes > > q = abs(m_1 - m_2)/sqrt(MSE*(1/n_1 + 1/n_2)/2) > > The extra divisor 2 in the square root comes from the fact that we are looking > as the absolute difference between m_1 and m_2. > > A 5% critical value can be computed using the -invtukeyprob()- function. > > crit = invtukeyprob(k, df, .95) > > The corresponding p-value can be computed using the -tukeyprob()- function. > > p = 1 - tukeyprob(k, df, q) > > If we can't assume unequal variances, then the test statistic becomes > > q = (m_1 - m_2)/sqrt((s_1^2/n_1 + s_2^2/n_2)/2) > > ----------------------------------------------------------------------------- > > Example 6 in -[R] ttest- performs an unpaired ttest assuming equal variances > > ***** BEGIN: > . ttesti 20 20 5 32 15 4 > > Two-sample t test with equal variances > ------------------------------------------------------------------------------ > | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] > ---------+-------------------------------------------------------------------- > x | 20 20 1.118034 5 17.65993 22.34007 > y | 32 15 .7071068 4 13.55785 16.44215 > ---------+-------------------------------------------------------------------- > combined | 52 16.92308 .6943785 5.007235 15.52905 18.3171 > ---------+-------------------------------------------------------------------- > diff | 5 1.256135 2.476979 7.523021 > ------------------------------------------------------------------------------ > diff = mean(x) - mean(y) t = 3.9805 > Ho: diff = 0 degrees of freedom = 50 > > Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 > Pr(T < t) = 0.9999 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.0001 > ***** END: > > Suppose this test represents only 1 comparison among 5 means, and lets pretend > that sqrt(MSE) is the same as the Std. Dev. for the combined means above. > Also, let's assume the total degrees of freedom is df = 100. > > The HSD test statistic is > > q = (20 - 15)/(5.007235*sqrt((1/20 + 1/15)/2)) > = 4.1344109 > > The 5% critical value is > > crit = invtukeyprob(k, df, .95) > = 3.9289372 > > The p-value is > > p = 1 - tukeyprob(k, df, q) > = .03400394 > > For unequal variances, the results from -ttesti- are > > ***** BEGIN: > . ttesti 20 20 5 32 15 4, unequal > > Two-sample t test with unequal variances > ------------------------------------------------------------------------------ > | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] > ---------+-------------------------------------------------------------------- > x | 20 20 1.118034 5 17.65993 22.34007 > y | 32 15 .7071068 4 13.55785 16.44215 > ---------+-------------------------------------------------------------------- > combined | 52 16.92308 .6943785 5.007235 15.52905 18.3171 > ---------+-------------------------------------------------------------------- > diff | 5 1.322876 2.311343 7.688657 > ------------------------------------------------------------------------------ > diff = mean(x) - mean(y) t = 3.7796 > Ho: diff = 0 Satterthwaite's degrees of freedom = 33.9142 > > Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 > Pr(T < t) = 0.9997 Pr(|T| > |t|) = 0.0006 Pr(T > t) = 0.0003 > ***** END: > > The HSD test statistic is > > q = (20 - 15)/sqrt((5^2/20 + 4^2/32)/2) > = 5.3452248 > > The 5% critical value is still > > crit = invtukeyprob(k, df, .95) > = 3.9289372 > > The p-value is > > p = 1 - tukeyprob(k, df, q) > = .00243234 > > --Jeff > jpitblado@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Tukey's HSD test from summary statistics***From:*jpitblado@stata.com (Jeff Pitblado, StataCorp LP)

- Prev by Date:
**Re: st: Drop subjects from a panel** - Next by Date:
**st: how can I merge two data sets over a range of values** - Previous by thread:
**Re: st: Tukey's HSD test from summary statistics** - Next by thread:
**Re: st: Tukey's HSD test from summary statistics** - Index(es):