In version 8, use suest. See section "Testing for cross model hypothesis" of the manual entry [R] suest for more details.
Title | Testing the equality of coefficients across independent areas | |
Author | Allen McDowell, StataCorp |
You must set up your data and regression model so that one model is nested in a more general model. For example, suppose you have two regressions,
y = a1 + b1*x
and
z = a2 + b2*x
You rename z to y and append the second dataset onto the first dataset. Then, you generate a dummy variable, call it d, that equals 1 if the data came from the second dataset and 0 if the data came from the first dataset. You then generate the interaction between x and d, i.e., w = d*x. Next, you estimate
y = a1 + a2*d + b1*x + b2*w
You can now test whether a2 and b2 are separately or jointly zero. This method generalizes in a straightforward manner to regressions with more than one independent variable.
Here is an example:
. set obs 10 obs was 0, now 10 . set seed 2001 . generate x = invnormal(uniform()) . generate y = 10 + 15*x + 2*invnormal(uniform()) . generate d=0 . regress y x Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 1363.66 Model | 2369.31814 1 2369.31814 Prob > F = 0.0000 Residual | 13.8997411 8 1.73746764 R-squared = 0.9942 -------------+------------------------------ Adj R-squared = 0.9934 Total | 2383.21788 9 264.801986 Root MSE = 1.3181 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 14.88335 .4030394 36.93 0.000 13.95394 15.81276 _cons | 10.15211 .4218434 24.07 0.000 9.179336 11.12488 ------------------------------------------------------------------------------ . save first file first.dta saved . clear . set obs 10 obs was 0, now 10 . set seed 2002 . generate x = invnormal(uniform()) . generate y = 19 + 17*x + 2*invnormal(uniform()) . generate d=1 . regress y x Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 177.94 Model | 1677.80047 1 1677.80047 Prob > F = 0.0000 Residual | 75.4304659 8 9.42880824 R-squared = 0.9570 -------------+------------------------------ Adj R-squared = 0.9516 Total | 1753.23094 9 194.803438 Root MSE = 3.0706 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 17.3141 1.297951 13.34 0.000 14.32102 20.30718 _cons | 18.37409 .9710377 18.92 0.000 16.13488 20.61331 ------------------------------------------------------------------------------ . save second file second.dta saved . append using first . generate w = x*d . regress y x w d Source | SS df MS Number of obs = 20 -------------+------------------------------ F( 3, 16) = 275.76 Model | 4618.88818 3 1539.62939 Prob > F = 0.0000 Residual | 89.330207 16 5.58313794 R-squared = 0.9810 -------------+------------------------------ Adj R-squared = 0.9775 Total | 4708.21839 19 247.800968 Root MSE = 2.3629 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 14.88335 .7224841 20.60 0.000 13.35176 16.41495 w | 2.430745 1.232696 1.97 0.066 -.1824545 5.043945 d | 8.221983 1.06309 7.73 0.000 5.968334 10.47563 _cons | 10.15211 .756192 13.43 0.000 8.549053 11.75516 ------------------------------------------------------------------------------
Notice that the constant and the coefficient on x are exactly the same as in the first regression. Here is a simple way to test that the coefficients on the dummy variable and the interaction term are jointly zero. This is, in effect, testing if the estimated parameters from the first regression are statistically different from the estimated parameters from the second regression:
. test _b[d] =0, notest ( 1) d = 0 . test _b[w] = 0, accum ( 1) d = 0 ( 2) w = 0 F( 2, 16) = 31.04 Prob > F = 0.0000
Here is how you construct the constant from the second regression from the estimated parameters of the third regression:
. lincom _b[_cons] + _b[d] ( 1) d + _cons = 0 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 18.37409 .7472172 24.59 0.000 16.79006 19.95812 ------------------------------------------------------------------------------
Here is how you construct the coefficient on x from the second regression using the estimated parameters from the third regression.
. lincom _b[x] + _b[w] ( 1) x + w = 0 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 17.3141 .9987779 17.34 0.000 15.19678 19.43141 ------------------------------------------------------------------------------