[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
icanette@stata.com (Isabel Canette, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: t-test comparing the means of two samples in imputed datasets |

Date |
Thu, 05 Nov 2009 12:03:27 -0600 |

I apologize for my previous message. It was copied the wrong way from the editor. Here is my original message: Clara Barata <maria_barata(at)mail(dot)harvard(dot)edu> has multiply imputed data, and wants to perform the equivalent to an unpaired t-test with equal variances: > Any idea on how to apply a ttest to compare means in datasets imputed with > MI (stata 11)? What would be the equivalent to: "ttest var , by (dummy)" in > the MI world? Let's forget for a moment that Clara has imputed data. As David Radwin <dradwin(at)mprinc(dot)com> pointed out: http://www.stata.com/statalist/archive/2009-11/msg00198.html performing an unpaired t-test with equal variances is equivalent to performing a regression where the dependent variable is our variable of interest, and the independent variable is a dummy that indicates one of the two groups. Here is an example: . sysuse auto, clear (1978 Automobile Data) . ttest price, by(foreign) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- Domestic | 52 6072.423 429.4911 3097.104 5210.184 6934.662 Foreign | 22 6384.682 558.9942 2621.915 5222.19 7547.174 ---------+-------------------------------------------------------------------- combined | 74 6165.257 342.8719 2949.496 5481.914 6848.6 ---------+-------------------------------------------------------------------- diff | -312.2587 754.4488 -1816.225 1191.708 ------------------------------------------------------------------------------ diff = mean(Domestic) - mean(Foreign) t = -0.4139 Ho: diff = 0 degrees of freedom = 72 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3401 Pr(|T| > |t|) = 0.6802 Pr(T > t) = 0.6599 . regress price foreign Source | SS df MS Number of obs = 74 -------------+------------------------------ F( 1, 72) = 0.17 Model | 1507382.66 1 1507382.66 Prob > F = 0.6802 Residual | 633558013 72 8799416.85 R-squared = 0.0024 -------------+------------------------------ Adj R-squared = -0.0115 Total | 635065396 73 8699525.97 Root MSE = 2966.4 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | 312.2587 754.4488 0.41 0.680 -1191.708 1816.225 _cons | 6072.423 411.363 14.76 0.000 5252.386 6892.46 ------------------------------------------------------------------------------ The t-test reported for the variable foreign is the two-tailed test reported by -ttest-. We can use the e-returned values by -regress- to obtain the three p-values: . mat b = e(b) . mat V = e(V) . scalar coef_for = el(b,1,1) . scalar se_for = sqrt(el(V,1,1)) . display 2*ttail(e(df_r), abs(coef_for/se_for)) .68018509 . display ttail(e(df_r), coef_for/se_for) .34009254 . display ttail(e(df_r), -coef_for/se_for) .65990746 Now, we can follow the analogous procedure for multiply-imputed data; this time the test will be performed on the variable rep78, after imputing it using -mi impute mlogit-. . sysuse auto, clear (1978 Automobile Data) . mi set flong . mi register imputed rep78 (5 m=0 obs. now marked as incomplete) . mi impute mlogit rep mpg disp turn, add(20) Univariate imputation Imputations = 20 Multinomial logistic regression added = 20 Imputed: m=1 through m=20 updated = 0 | Observations per m |---------------------------------------------- Variable | complete incomplete imputed | total ---------------+-----------------------------------+---------- rep78 | 69 5 5 | 74 -------------------------------------------------------------- (complete + incomplete = total; imputed is the minimum across m of the number of filled in observations.) . mi estimate: regress rep78 foreign Multiple-imputation estimates Imputations = 20 Linear regression Number of obs = 74 Average RVI = 0.0687 Complete DF = 72 DF adjustment: Small sample DF: min = 64.44 avg = 64.54 max = 64.65 Model F test: Equal FMI F( 1, 64.4) = 30.16 Within VCE type: OLS Prob > F = 0.0000 ------------------------------------------------------------------------------ rep78 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | 1.199738 .2184457 5.49 0.000 .7633995 1.636076 _cons | 3.054808 .1189696 25.68 0.000 2.817185 3.292431 ------------------------------------------------------------------------------ The t-test reported for foreign is the MI version for the two-tailed t-test. We also can use the returned values from -mi estimate- to compute two-tailed and one-tailed p-values: . scalar coef_for = el(e(b_mi),1,1) . scalar se_for = sqrt(el(e(V_mi),1,1)) . scalar df_for = el(e(df_mi),1,1) . display 2*ttail(df_for, abs(coef_for/se_for)) 7.213e-07 . display ttail(df_for, coef_for/se_for) 3.606e-07 . display ttail(df_for, -coef_for/se_for) .99999964 Notice that in the MI framework there is a specific degrees of freedom value for each coefficient. This is why I need to take the specific degrees of freedom for the first coefficient from the matrix e(df_mi). -- Isabel icanette(at)stata(dot)com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: AW: RE: RE: Suppress col total in tab2, tabout?** - Next by Date:
**Re: st: AW: RE: RE: Suppress col total in tab2, tabout?** - Previous by thread:
**st: Keep value labels after -mvdecode-** - Next by thread:
**st: Hilo command and outliers** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |