Stata: Data Analysis and Statistical Software
   >> Home >> Resources & support >> FAQs >> Computing the Chow statistic

How can I compute the Chow test statistic?

Title   Computing the Chow statistic
Author William Gould, StataCorp
Date January 1999; updated July 2011

You can include the dummy variables in a regression of the full model and then use the test command on those dummies. You could also run each of the models and then write down the appropriate numbers and calculate the statistic by hand—you also have access to functions to get appropriate p-values.


Here is a longer answer:

Let’s start with the Chow test to which many refer. Consider the model,

    y = a + b*x1 + c*x2 + u

and say we have two groups of data. We could fit that model on the two groups separately,

    y = a1 + b1*x1 + c1*x2 + u         for group == 1

    y = a2 + b2*x1 + c2*x2 + u         for group == 2

and we could fit a single, pooled regression

    y = a  + b*x1  + c*x2 + u          for both groups

In the last regression, we are asserting that a1==a2, b1==b2, and c1==c2. The formula for the “Chow test” of this constraint is

         ess_c - (ess_1+ess_2)
         ---------------------
                  k
    ---------------------------------
            ess_1 + ess_2
           ---------------
           N_1 + N_2 - 2*k

and this is the formula to which people refer. ess_1 and ess_2 are the error sum of squares from the separate regressions, ess_c is the error sum of squares from the pooled (constrained) regression, k is the number or estimated parameters (k=3 in our case), and N_1 and N_2 are the number of observations in the two groups.

The resulting test statistic is distributed F(k, N_1+N_2-2*k).

Let’s try this. I have created small datasets:

 clear
 set obs 100
 set seed 1234
 generate x1 = uniform() 
 generate x2 = uniform()
 generate y = 4*x1 - 2*x2 + 2*invnormal(uniform())
 generate group = 1
 save one, replace

 clear
 set obs 80
 generate x1 = uniform()
 generate x2 = uniform()
 generate y = -2*x1 + 3*x2 + 8*invnormal(uniform())
 generate group = 2
 save two, replace 

 use one, clear
 append using two
 save combined, replace

The models are different in the two groups, the residual variances are different, and so are the number of observations. With this dataset, I can carry forth the Chow test. First, I run the separate regressions:

 . regress y x1 x2 if group==1
    
   Source |       SS       df       MS                  Number of obs =     100
 ---------+------------------------------               F(  2,    97) =   36.10
    Model |  328.686307     2  164.343154               Prob > F      =  0.0000
 Residual |  441.589627    97  4.55247038               R-squared     =  0.4267
 ---------+------------------------------               Adj R-squared =  0.4149
    Total |  770.275934    99  7.78056499               Root MSE      =  2.1337
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   5.121087    .728493      7.03    0.000        3.67523    6.566944
       x2 |  -3.227026   .7388209     -4.37    0.000      -4.693381   -1.760671
    _cons |  -.1725655   .5698273     -0.30    0.763      -1.303515    .9583839
 ------------------------------------------------------------------------------
    
 
 . regress y x1 x2 if group==2
 
   Source |       SS       df       MS                  Number of obs =      80
 ---------+------------------------------               F(  2,    77) =    5.02
    Model |   544.11726     2   272.05863               Prob > F      =  0.0089
 Residual |  4169.24211    77  54.1460014               R-squared     =  0.1154
 ---------+------------------------------               Adj R-squared =  0.0925
    Total |  4713.35937    79  59.6627768               Root MSE      =  7.3584
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   -1.21464     2.9578     -0.41    0.682      -7.104372    4.675092
       x2 |    8.49714   2.688249      3.16    0.002       3.144152    13.85013
    _cons |    -2.2591    1.91076     -1.18    0.241       -6.06391    1.545709
 ------------------------------------------------------------------------------

Then I run the combined regression:

 . regress y x1 x2 
 
   Source |       SS       df       MS                  Number of obs =     180
 ---------+------------------------------               F(  2,   177) =    2.93
    Model |  176.150454     2  88.0752272               Prob > F      =  0.0559
 Residual |  5316.21341   177   30.035104               R-squared     =  0.0321
 ---------+------------------------------               Adj R-squared =  0.0211
    Total |  5492.36386   179   30.683597               Root MSE      =  5.4804
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   2.692373    1.41842      1.90    0.059      -.1068176    5.491563
       x2 |   2.061004   1.370448      1.50    0.134      -.6435156    4.765524
    _cons |  -1.380331   1.017322     -1.36    0.177      -3.387973      .62731
 ------------------------------------------------------------------------------

For the Chow test,

           ess_c - (ess_1+ess_2)
           ---------------------
                    k
     ---------------------------------
               ess_1 + ess_2
              ---------------
              N_1 + N_2 - 2*k

here are the relevant numbers copied from the output above:

    ess_c =  5316.21341            (from combined regression)

    ess_1 =   441.589627           (from group==1 regression)
    ess_2 =  4169.24211            (from group==2 regression)

        k = 3                      (we estimate 3 parameters)
      N_1 = 100                    (from group==1 regression)
      N_2 =  80                    (from group==2 regression)

So, plugging in, we get

      5316.21341 - (441.589628+4169.24211)              705.38167
      ------------------------------------              ---------
                      3                                     3
    -----------------------------------------  =     ---------------
            441.589628 + 4169.24211                     4610.8317
            -----------------------                     ---------
                 100+80-2*3                                174

                                                        235.12722
                                               =       ----------
                                                        26.499033


                                               =     8.8730491

The Chow test is F(k,N_1+N_2-2*k) = F(3,174), so our test statistic is F(3,174) = 8.8730491.

Now I will do the same problem by running one regression and using test to test certain coefficients equal to zero. What I want to do is fit the model

     y = a3 + b3*x1 + c3*x2 + a3'*g2 + b3'*g2*x1 + c3'*g2*x2 + u

where g2=1 if group==2 and g2=0 otherwise. I can do this by typing

 . generate g2 = (group==2)
 . generate g2x1 = g2*x1
 . generate g2x2 = g2*x2
 . regress y x1 x2 g2 g2x1 g2x2

Think about the predictions from this model. The model says

    y =     a3   +       b3*x1 +       c3*x2 + u     when g2==0
    y = (a3+a3') + (b3+b3')*x1 + (c3+c3')*x2 + u     when g2==1

Thus the model is equivalent to fitting the separate models

    y = a1 + b1*x1 + c1*x2 + u         for group == 1
    y = a2 + b2*x1 + c2*x2 + u         for group == 2

The relationship being

    a1 = a3               a2 = a3 + a3'
    b1 = b3               b2 = b3 + b3'
    c1 = c3               c2 = c3 + c3'

Some of you may be concerned that in the pooled model (the one estimating a3, b3, etc.), we are constraining the var(u) to be the same for each group, whereas, in the separate-equation model, we estimate different variances for group 1 and group 2. This does not matter, because the model is fully interacted. That is probably not convincing, but what should be convincing is that I am about to obtain the same F(3,174) = 8.87 answer and, in my concocted data, I have different variances in each group.

So, here is the result of the alternative test coeffiecients against 0 in a pooled specification:

 . generate g2 = (group==2)
 
 . generate g2x1 = g2*x1
     
 . generate g2x2 = g2*x2
 
 . regress y x1 x2 g2 g2x1 g2x2
 
   Source |       SS       df       MS                  Number of obs =     180
 ---------+------------------------------               F(  5,   174) =    6.65
    Model |  881.532123     5  176.306425               Prob > F      =  0.0000
 Residual |  4610.83174   174   26.499033               R-squared     =  0.1605
 ---------+------------------------------               Adj R-squared =  0.1364
    Total |  5492.36386   179   30.683597               Root MSE      =  5.1477
 
 ------------------------------------------------------------------------------
        y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
       x1 |   5.121087   1.757587      2.91    0.004       1.652152    8.590021
       x2 |  -3.227026   1.782504     -1.81    0.072      -6.745139    .2910877
       g2 |  -2.086535   1.917507     -1.09    0.278      -5.871102    1.698032
     g2x1 |  -6.335727   2.714897     -2.33    0.021       -11.6941   -.9773583
     g2x2 |   11.72417    2.59115      4.52    0.000       6.610035     16.8383
    _cons |  -.1725655   1.374785     -0.13    0.900      -2.885966    2.540835
 ------------------------------------------------------------------------------
 
 . test g2 g2x1 g2x2
 
  ( 1)  g2 = 0 
  ( 2)  g2x1 = 0 
  ( 3)  g2x2 = 0 
 
        F(  3,   174) =    8.87
          Prob > F =    0.0000

Same answer.

This definition of the “Chow test” is equivalent to pooling the data, fitting the fully interacted model, and then testing the group 2 coefficients against 0.

That is why I said, “Chow Test is a term I have heard used by economists in the context of testing a set of regression coefficients being equal to 0.”

Admittedly, this leaves a lot unsaid.

The issue of the variance of u being equal in the two groups is subtle, but I do not want that to get in the way of understanding that the Chow test is equivalent to the “pool the data, interact, and test” procedure. They are equivalent.

Concerning variances, the Chow test itself is testing against a pooled, uninteracted model and so has buried in it an assumption of equal variances. It is really a test that the coefficients are equal and variance(u) in the groups are equal. It is, however, a weak test of the equality of variances because that assumption manifests itself only in how the pooled coefficient estimates are manufactured. Because the Chow test and the “pool the data, interact, and test” procedure are the same, the same is true of both procedures.

Your second concern might be that in the “pool the data, interact, and test” procedure there is an extra assumption of equality of variances because everything comes from the pooled model. As shown, this is not true. It is not true because the model is fully interacted, so the assumption of equal variances never makes a difference in the calculation of the coefficients.

In Stata 12, you can also use the contrast command with factor variables to perform the same test:

.  regress y c.x1##i.g2 c.x2##i.g2

      Source |       SS       df       MS              Number of obs =     180
-------------+------------------------------           F(  5,   174) =    6.65
       Model |  881.532123     5  176.306425           Prob > F      =  0.0000
    Residual |  4610.83174   174   26.499033           R-squared     =  0.1605
-------------+------------------------------           Adj R-squared =  0.1364
       Total |  5492.36386   179   30.683597           Root MSE      =  5.1477

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   5.121087   1.757587     2.91   0.004     1.652152    8.590021
        1.g2 |  -2.086535   1.917507    -1.09   0.278    -5.871102    1.698032
             |
     g2#c.x1 |
          1  |  -6.335727   2.714897    -2.33   0.021     -11.6941   -.9773583
             |
          x2 |  -3.227026   1.782504    -1.81   0.072    -6.745139    .2910877
             |
     g2#c.x2 |
          1  |   11.72417    2.59115     4.52   0.000     6.610035     16.8383
             |
       _cons |  -.1725655   1.374785    -0.13   0.900    -2.885966    2.540835
------------------------------------------------------------------------------

.  contrast g2 g2#c.x1 g2#c.x2,overall 

Contrasts of marginal linear predictions

Margins      : asbalanced

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
          g2 |          1        1.18     0.2780
             |
     g2#c.x1 |          1        5.45     0.0208
             |
     g2#c.x2 |          1       20.47     0.0000
             |
     Overall |          3        8.87     0.0000
             |
    Residual |        174
------------------------------------------------

An additional example can be found in the “Chow tests” section of [R] contrast.

Bookmark and Share 
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site