[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: t-test comparing the means of two samples in imputed datasets

From	[email protected] (Isabel Canette, StataCorp)
To	[email protected]
Subject	Re: st: RE: t-test comparing the means of two samples in imputed datasets
Date	Thu, 05 Nov 2009 12:03:27 -0600
I apologize for my previous message.  It was copied the wrong way from the
editor.  Here is my original message:

Clara Barata <maria_barata(at)mail(dot)harvard(dot)edu> has multiply imputed
data, and wants to perform the equivalent to an unpaired t-test with equal
variances:

> Any idea on how to apply a ttest to compare means in datasets imputed with
> MI (stata 11)?  What would be the equivalent to: "ttest var , by (dummy)" in
> the MI world?

Let's forget for a moment that Clara has imputed data.  As David Radwin
<dradwin(at)mprinc(dot)com> pointed out:

	http://www.stata.com/statalist/archive/2009-11/msg00198.html

performing an unpaired t-test with equal variances is equivalent to performing
a regression where the dependent variable is our variable of interest, and the
independent variable is a dummy that indicates one of the two groups.  Here is
an example:

. sysuse auto, clear
(1978 Automobile Data)

. ttest price, by(foreign)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic |      52    6072.423    429.4911    3097.104    5210.184    6934.662
 Foreign |      22    6384.682    558.9942    2621.915     5222.19    7547.174
---------+--------------------------------------------------------------------
combined |      74    6165.257    342.8719    2949.496    5481.914      6848.6
---------+--------------------------------------------------------------------
    diff |           -312.2587    754.4488               -1816.225    1191.708
------------------------------------------------------------------------------
    diff = mean(Domestic) - mean(Foreign)                         t =  -0.4139
Ho: diff = 0                                     degrees of freedom =       72

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3401         Pr(|T| > |t|) = 0.6802          Pr(T > t) = 0.6599

. regress price foreign

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =    0.17
       Model |  1507382.66     1  1507382.66           Prob > F      =  0.6802
    Residual |   633558013    72  8799416.85           R-squared     =  0.0024
-------------+------------------------------           Adj R-squared = -0.0115
       Total |   635065396    73  8699525.97           Root MSE      =  2966.4

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
       _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
------------------------------------------------------------------------------

The t-test reported for the variable foreign is the two-tailed test reported
by -ttest-.  We can use the e-returned values by -regress- to obtain the three
p-values:

. mat b = e(b)

. mat V = e(V)

. scalar coef_for = el(b,1,1)

. scalar se_for = sqrt(el(V,1,1))

. display 2*ttail(e(df_r), abs(coef_for/se_for))
.68018509

. display ttail(e(df_r), coef_for/se_for)
.34009254

. display ttail(e(df_r), -coef_for/se_for)
.65990746

Now, we can follow the analogous procedure for multiply-imputed data; this
time the test will be performed on the variable rep78, after imputing it using
-mi impute mlogit-.


. sysuse auto, clear
(1978 Automobile Data)

. mi set flong

. mi register imputed rep78
(5 m=0 obs. now marked as incomplete)

. mi impute mlogit rep mpg disp turn, add(20)

Univariate imputation                   Imputations =       20
Multinomial logistic regression               added =       20
Imputed: m=1 through m=20                   updated =        0

               |              Observations per m              
               |----------------------------------------------
      Variable |   complete   incomplete   imputed |     total
---------------+-----------------------------------+----------
         rep78 |         69            5         5 |        74
--------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled in observations.)

. mi estimate: regress rep78 foreign

Multiple-imputation estimates                     Imputations     =         20
Linear regression                                 Number of obs   =         74
                                                  Average RVI     =     0.0687
                                                  Complete DF     =         72
DF adjustment:   Small sample                     DF:     min     =      64.44
                                                          avg     =      64.54
                                                          max     =      64.65
Model F test:       Equal FMI                     F(   1,   64.4) =      30.16
Within VCE type:          OLS                     Prob > F        =     0.0000

------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |   1.199738   .2184457     5.49   0.000     .7633995    1.636076
       _cons |   3.054808   .1189696    25.68   0.000     2.817185    3.292431
------------------------------------------------------------------------------

The t-test reported for foreign is the MI version for the two-tailed t-test.
We also can use the returned values from -mi estimate- to compute two-tailed
and one-tailed p-values:

. scalar coef_for = el(e(b_mi),1,1)

. scalar se_for = sqrt(el(e(V_mi),1,1))

. scalar df_for = el(e(df_mi),1,1)

. display 2*ttail(df_for, abs(coef_for/se_for))
7.213e-07

. display ttail(df_for, coef_for/se_for)
3.606e-07

. display ttail(df_for, -coef_for/se_for)
.99999964


Notice that in the MI framework there is a specific degrees of freedom value
for each coefficient.  This is why I need to take the specific degrees of
freedom for the first coefficient from the matrix e(df_mi).


-- Isabel
icanette(at)stata(dot)com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: RE: AW: RE: RE: Suppress col total in tab2, tabout?
Next by Date: Re: st: AW: RE: RE: Suppress col total in tab2, tabout?
Previous by thread: st: Keep value labels after -mvdecode-
Next by thread: st: Hilo command and outliers
Index(es):
- Date
- Thread