Note for version 8 (and higher) users: in version 8, use
suest.
See section "Testing for cross model hypothesis" of the manual entry [R] suest for more details.
How do you test the equality of regression coefficients
that are generated from two different regressions, estimated on two
different samples?
|
Title
|
|
Testing the equality of coefficients across independent areas
|
|
Author
|
Allen McDowell, StataCorp
|
|
Date
|
April 2001; updated July 2005
|
You must set up your data and regression model so that one model is nested
in a more general model. For example, suppose you have two regressions,
y = a1 + b1*x
and
z = a2 + b2*x
You rename z to y and append the second dataset onto the first dataset.
Then, you generate a dummy variable, call it d, that equals 1 if the data
came from the second dataset and 0 if the data came from the first dataset.
You then generate the interaction between x and d, i.e., w = d*x. Next, you
estimate
y = a1 + a2*d + b1*x + b2*w
You can now test whether a2 and b2 are separately or jointly zero. This
method generalizes in a straightforward manner to regressions with more than
one independent variable.
Here is an example:
. set obs 10
obs was 0, now 10
. set seed 2001
. generate x = invnormal(uniform())
. generate y = 10 + 15*x + 2*invnormal(uniform())
. generate d=0
. regress y x
Source | SS df MS Number of obs = 10
-------------+------------------------------ F( 1, 8) = 1363.66
Model | 2369.31814 1 2369.31814 Prob > F = 0.0000
Residual | 13.8997411 8 1.73746764 R-squared = 0.9942
-------------+------------------------------ Adj R-squared = 0.9934
Total | 2383.21788 9 264.801986 Root MSE = 1.3181
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 14.88335 .4030394 36.93 0.000 13.95394 15.81276
_cons | 10.15211 .4218434 24.07 0.000 9.179336 11.12488
------------------------------------------------------------------------------
. save first
file first.dta saved
. clear
. set obs 10
obs was 0, now 10
. set seed 2002
. generate x = invnormal(uniform())
. generate y = 19 + 17*x + 2*invnormal(uniform())
. generate d=1
. regress y x
Source | SS df MS Number of obs = 10
-------------+------------------------------ F( 1, 8) = 177.94
Model | 1677.80047 1 1677.80047 Prob > F = 0.0000
Residual | 75.4304659 8 9.42880824 R-squared = 0.9570
-------------+------------------------------ Adj R-squared = 0.9516
Total | 1753.23094 9 194.803438 Root MSE = 3.0706
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 17.3141 1.297951 13.34 0.000 14.32102 20.30718
_cons | 18.37409 .9710377 18.92 0.000 16.13488 20.61331
------------------------------------------------------------------------------
. save second
file second.dta saved
. append using first
. generate w = x*d
. regress y x w d
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 3, 16) = 275.76
Model | 4618.88818 3 1539.62939 Prob > F = 0.0000
Residual | 89.330207 16 5.58313794 R-squared = 0.9810
-------------+------------------------------ Adj R-squared = 0.9775
Total | 4708.21839 19 247.800968 Root MSE = 2.3629
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 14.88335 .7224841 20.60 0.000 13.35176 16.41495
w | 2.430745 1.232696 1.97 0.066 -.1824545 5.043945
d | 8.221983 1.06309 7.73 0.000 5.968334 10.47563
_cons | 10.15211 .756192 13.43 0.000 8.549053 11.75516
------------------------------------------------------------------------------
Notice that the constant and the coefficient on x are exactly the same as in
the first regression. Here is a simple way to test that the coefficients on
the dummy variable and the interaction term are jointly zero. This is, in
effect, testing if the estimated parameters from the first regression are
statistically different from the estimated parameters from the second
regression:
. test _b[d] =0, notest
( 1) d = 0
. test _b[w] = 0, accum
( 1) d = 0
( 2) w = 0
F( 2, 16) = 31.04
Prob > F = 0.0000
Here is how you construct the constant from the second regression from the
estimated parameters of the third regression:
. lincom _b[_cons] + _b[d]
( 1) d + _cons = 0
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 18.37409 .7472172 24.59 0.000 16.79006 19.95812
------------------------------------------------------------------------------
Here is how you construct the coefficient on x from the second regression
using the estimated parameters from the third regression.
. lincom _b[x] + _b[w]
( 1) x + w = 0
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 17.3141 .9987779 17.34 0.000 15.19678 19.43141
------------------------------------------------------------------------------
|