Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Cross-Sectional Time Series

From (Vince Wiggins, StataCorp)
Subject   Re: st: Cross-Sectional Time Series
Date   Wed, 26 Jun 2002 15:45:05 -0500

I have one additional comment in the continuing thread comparing the results
of -regress-, -xtreg, fe-, and -xtreg , re-.  

While I agree with the comparisons between the models presented by Mark
Schaffer <> and David Drukker <>, there
is a more mundane reason why the example presented by Anirban Basu
<> elicits virtually identical estimates from
-regress-, -xtreg, fe-, and -xtreg, re-.  The short answer is they have to be
identical, at least to machine precision of the computations.

Anirban Basu asks us to generate data in the following manner,

    . mat C= (1, 0.6, 0.6, 0.6 \  0.6, 1, 0.6, 0.6 \ 0.6, 0.6, 1, 0.6 \  /*
 	*/ 0.6, 0.6, 0.6, 1)
    . drawnorm y1 y2 y3 y4, n(1000) means(1 3 4 7) corr(C)
    . gen id=_n
    . reshape long y , i(id) j(time)

Anirban is using -drawnorm- to create 4 correlated variables and then
-reshape- to turn these into a panel data with 4 values for a single y.  This
is a fine way to create data with a random effect.  Here are the first three

. list in 1/12

            id       time          y
  1.         1          1  -.0939699
  2.         1          2   2.265574
  3.         1          3   2.323656
  4.         1          4   6.053069
  5.         2          1   1.367081
  6.         2          2   3.062155
  7.         2          3   4.830178
  8.         2          4   7.105754
  9.         3          1   1.145398
 10.         3          2   4.087784
 11.         3          3    3.99791
 12.         3          4   6.942679

Anirban, the asks us to try the OLS, fixed-effects, and random-effects
estimators on this data by typing,

     . regress y time

     . xtreg   y time , i(id) fe 
     . xtreg   y time , i(id) re 

What is unusual about this model is that we are including -time- as a
regressor.  Note that we have perfectly balanced panels of 4 observations
each, and that the variable -time- exactly repeats itself -- counting 1, 2, 3,
4 in each panel.

What does this mean for the fixed-effects (FE) transformation?  The FE
transformation just subtracts the panel mean for each variable (dependent and
independent) from each value.  The panel mean for time is 2.5 in every panel.
This means the the FE transformation just subtracts a constant value from
-time-.  Subtracting a constant from a regressor does not have any effect on
its estimated coefficient.  

But wait, we also subtracted the panel means from the dependent variable y and
those means were not the same for each panel.  As it turns out, when panels
are balanced, the FE transformation of any variable produces a variable that
has a regression coefficient of exactly 1 when regressed against the
untransformed variable.  Thus, the relationship with a variable that has not
been transformed (like -time-, that had only a constant subtracted) remains
exactly the same.

So, with only a single independent variable that repeats exactly in each
balanced panel, OLS and fixed-effects regression will produce the same
estimate of the coefficient on the regressor (within machine tolerance of the
different computations performed).

Side-note:  While I was aware of the behaviour of variables that repeat within
panel for balanced panels, I hadn't previously considered why the FE
transformation of the dependent variable has no effect.  A little scribbling
on the white board from Bobby Gutierrez <> shows that when
the FE transformation is expressed in matrix form it is idempotent for balanced
panels.  That causes the transformation to essentially fall out of regression
of y on y-transformed leaving a coefficient of 1.

What about the random-effects (RE) estimator?  The GLS random-effects
estimator is just a matrix-weighted combination of the FE estimator and the
between-effects (BE) estimator.  The BE estimator is a regression of the
panel-level mean of each variable (again, dependent and independent).  As we
saw above, the panel-level mean for -time- is a constant 2.5 in every panel
and thus is collinear with the constant.  This means that the between
estimator cannot estimate B_time and provides no additional information for
this coefficient.  It has no contribution to the RE estimator.  So, the RE
estimator must be identical to the FE estimator in a model with a single
covariate that repeats exactly within each balanced panel.

-- Vince

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index