Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Tukey's HSD test from summary statistics


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Tukey's HSD test from summary statistics
Date   Wed, 08 Feb 2012 13:46:23 -0600

Maria Niarchou <m.niarchou@hotmail.com> asks

> Is there a way to calculate Tukey's HSD test in Stata when only sample
> sizes, means and standard deviations are available?

The short answer is: Yes.

New in Stata 12 are the functions -tukeyprob()- and -invtukeyprob()- that
compute cumulative probabilities and quantiles from Tukey's studentized range
distribution.

-----------------------------------------------------------------------------

Here is the longer answer with some formulas, followed by an example.

Suppose we have k means to compare, where mean m_i and standard deviation s_i
were computed from group i having sample size n_i.

Our first problem is to determine how to estimate the standard error of a
given difference, say

	SE(m_1-m_2) = ?

Assuming a common variance between the k groups, we can pool the sample
variance estimates to get

	MSE = (1/df) sum_i (n_i-1)*s_i^2

where

	df = sum_i (n_i - 1)

So the HSD test statistic, assuming equal variances, becomes

	q = abs(m_1 - m_2)/sqrt(MSE*(1/n_1 + 1/n_2)/2)

The extra divisor 2 in the square root comes from the fact that we are looking
as the absolute difference between m_1 and m_2.

A 5% critical value can be computed using the -invtukeyprob()- function.

	crit = invtukeyprob(k, df, .95)

The corresponding p-value can be computed using the -tukeyprob()- function.

	p = 1 - tukeyprob(k, df, q)

If we can't assume unequal variances, then the test statistic becomes

	q = (m_1 - m_2)/sqrt((s_1^2/n_1 + s_2^2/n_2)/2)

-----------------------------------------------------------------------------

Example 6 in -[R] ttest- performs an unpaired ttest assuming equal variances 

***** BEGIN:
. ttesti 20 20 5 32 15 4

Two-sample t test with equal variances
------------------------------------------------------------------------------
         |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |      20          20    1.118034           5    17.65993    22.34007
       y |      32          15    .7071068           4    13.55785    16.44215
---------+--------------------------------------------------------------------
combined |      52    16.92308    .6943785    5.007235    15.52905     18.3171
---------+--------------------------------------------------------------------
    diff |                   5    1.256135                2.476979    7.523021
------------------------------------------------------------------------------
    diff = mean(x) - mean(y)                                      t =   3.9805
Ho: diff = 0                                     degrees of freedom =       50

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9999         Pr(|T| > |t|) = 0.0002          Pr(T > t) = 0.0001
***** END:

Suppose this test represents only 1 comparison among 5 means, and lets pretend
that sqrt(MSE) is the same as the Std. Dev. for the combined means above.
Also, let's assume the total degrees of freedom is df = 100.

The HSD test statistic is

	q	= (20 - 15)/(5.007235*sqrt((1/20 + 1/15)/2))
	 	= 4.1344109

The 5% critical value is

	crit	= invtukeyprob(k, df, .95)
	    	= 3.9289372

The p-value is

	p	= 1 - tukeyprob(k, df, q)
	 	= .03400394

For unequal variances, the results from -ttesti- are

***** BEGIN:
. ttesti 20 20 5 32 15 4, unequal

Two-sample t test with unequal variances
------------------------------------------------------------------------------
         |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |      20          20    1.118034           5    17.65993    22.34007
       y |      32          15    .7071068           4    13.55785    16.44215
---------+--------------------------------------------------------------------
combined |      52    16.92308    .6943785    5.007235    15.52905     18.3171
---------+--------------------------------------------------------------------
    diff |                   5    1.322876                2.311343    7.688657
------------------------------------------------------------------------------
    diff = mean(x) - mean(y)                                      t =   3.7796
Ho: diff = 0                     Satterthwaite's degrees of freedom =  33.9142

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9997         Pr(|T| > |t|) = 0.0006          Pr(T > t) = 0.0003
***** END:

The HSD test statistic is

	q	= (20 - 15)/sqrt((5^2/20 + 4^2/32)/2)
	 	= 5.3452248

The 5% critical value is still

	crit	= invtukeyprob(k, df, .95)
	    	= 3.9289372

The p-value is

	p	= 1 - tukeyprob(k, df, q)
	 	= .00243234

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index