Stata | FAQ: Testing the equality of coefficients across independent areas

Home / Resources & support / FAQs / Testing the equality of coefficients across independent areas

Note: For version 8 (and higher) users.

In version 8, use suest. See section "Testing for cross model hypothesis" of the manual entry [R] suest for more details.

How do you test the equality of regression coefficients that are generated from two different regressions, estimated on two different samples?

Title		Testing the equality of coefficients across independent areas
Author		Allen McDowell, StataCorp

You must set up your data and regression model so that one model is nested in a more general model. For example, suppose you have two regressions,

    y = a1 + b1*x

and

    z = a2 + b2*x

You rename z to y and append the second dataset onto the first dataset. Then, you generate a dummy variable, call it d, that equals 1 if the data came from the second dataset and 0 if the data came from the first dataset. You then generate the interaction between x and d, i.e., w = d*x. Next, you estimate

    y = a1 + a2*d + b1*x + b2*w

You can now test whether a2 and b2 are separately or jointly zero. This method generalizes in a straightforward manner to regressions with more than one independent variable.

Here is an example:

 . set obs 10
 obs was 0, now 10
 
 . set seed 2001
 
 . generate x = invnormal(uniform())
 
 . generate y = 10 + 15*x + 2*invnormal(uniform())
 
 . generate d=0
 
 . regress y x
 
       Source |       SS       df       MS              Number of obs =      10
 -------------+------------------------------           F(  1,     8) = 1363.66
        Model |  2369.31814     1  2369.31814           Prob > F      =  0.0000
     Residual |  13.8997411     8  1.73746764           R-squared     =  0.9942
 -------------+------------------------------           Adj R-squared =  0.9934
        Total |  2383.21788     9  264.801986           Root MSE      =  1.3181
 
 ------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
            x |   14.88335   .4030394    36.93   0.000     13.95394    15.81276
        _cons |   10.15211   .4218434    24.07   0.000     9.179336    11.12488
 ------------------------------------------------------------------------------
 
 . save first
 file first.dta saved
 
 . clear
 
 . set obs 10
 obs was 0, now 10
 
 . set seed 2002
 
 . generate x = invnormal(uniform())
 
 . generate y = 19 + 17*x + 2*invnormal(uniform())
 
 . generate d=1
 
 . regress y x
 
       Source |       SS       df       MS              Number of obs =      10
 -------------+------------------------------           F(  1,     8) =  177.94
        Model |  1677.80047     1  1677.80047           Prob > F      =  0.0000
     Residual |  75.4304659     8  9.42880824           R-squared     =  0.9570
 -------------+------------------------------           Adj R-squared =  0.9516
        Total |  1753.23094     9  194.803438           Root MSE      =  3.0706
 
 ------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
            x |    17.3141   1.297951    13.34   0.000     14.32102    20.30718
        _cons |   18.37409   .9710377    18.92   0.000     16.13488    20.61331
 ------------------------------------------------------------------------------
 
 . save second
 file second.dta saved
 
 . append using first
 
 . generate w = x*d
 
 . regress y x  w d
 
       Source |       SS       df       MS              Number of obs =      20
 -------------+------------------------------           F(  3,    16) =  275.76
        Model |  4618.88818     3  1539.62939           Prob > F      =  0.0000
     Residual |   89.330207    16  5.58313794           R-squared     =  0.9810
 -------------+------------------------------           Adj R-squared =  0.9775
        Total |  4708.21839    19  247.800968           Root MSE      =  2.3629
 
 ------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
            x |   14.88335   .7224841    20.60   0.000     13.35176    16.41495
            w |   2.430745   1.232696     1.97   0.066    -.1824545    5.043945
            d |   8.221983    1.06309     7.73   0.000     5.968334    10.47563
        _cons |   10.15211    .756192    13.43   0.000     8.549053    11.75516
 ------------------------------------------------------------------------------

Notice that the constant and the coefficient on x are exactly the same as in the first regression. Here is a simple way to test that the coefficients on the dummy variable and the interaction term are jointly zero. This is, in effect, testing if the estimated parameters from the first regression are statistically different from the estimated parameters from the second regression:

 . test _b[d] =0, notest
 
  ( 1)  d = 0
 
 . test _b[w] = 0, accum
 
  ( 1)  d = 0
  ( 2)  w = 0
 
        F(  2,    16) =   31.04
             Prob > F =    0.0000

Here is how you construct the constant from the second regression from the estimated parameters of the third regression:

 . lincom _b[_cons] + _b[d]
 
  ( 1)  d + _cons = 0
 
 ------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          (1) |   18.37409   .7472172    24.59   0.000     16.79006    19.95812
 ------------------------------------------------------------------------------

Here is how you construct the coefficient on x from the second regression using the estimated parameters from the third regression.

 . lincom _b[x] + _b[w]
 
  ( 1)  x + w = 0
 
 ------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
          (1) |    17.3141   .9987779    17.34   0.000     15.19678    19.43141
 ------------------------------------------------------------------------------

How do you test the equality of regression coefficients that are generated from two different regressions, estimated on two different samples?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do you test the equality of regression coefficients that are generated from two different regressions, estimated on two different samples?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies