I have panel data on student achievement. I want to estimate a model that
includes fixed effects for students, schools and time:
Yijt = ai + bj + ct + dXijt + eijt
where i indexes students, j indexes schools and t indexes time. The number
of time periods is small so I can include explicit time dummies to control
for time:
However, the numbers of students and schools are both large, thus running
OLS with dummy variables is not feasible.
According to Greene (Econometric Analysis, 2nd ed., pp. 468-469) a solution
to the problem is to estimate the following:
(2) Y*ijt = f1T1* + ... + fnTn* + gX*ijt + e*ijt
where Y*ijt = Yij - (student's mean Y over time) - (school's mean Y over
all students) + (mean Y averaged over all students and schools). Likewise
for T1*...Tn* and X*.
It would seem that this approach could be implemented in Stata in either of
the following ways:
(a) explicitly calculate the de-meaned variables, Y*, T1*...Tn* and X* and
run .reg using these de-meaned variables
(b) take the difference between each observation and the school mean (ie.
(Yijt - (school mean over all students)), etc.) and run xtreg or areg with
student fixed effects.
I have run both models (a) and (b) on a small data set where I can also
estimate the model with explicit student, school and time dummies.
Both methods (a) and (b) yield coefficient estimates that are different
from one another and different from the model with explicit dummy variables
for all three effects. Bob Bifulco (U. Conn.) has been working on the same
problem with a different data set and comes up with the same inconsistent
results. A copy of my .do file and results follows. Any suggestions would
be greatly appreciated.
Tim
. * ******************************************************* ;
. * Set Panel Variables ;
. * ******************************************************* ;
. tsset student year ;
panel variable: student, 114 to 872489
time variable: year, 1999 to 2001, but with gaps
. * ******************************************************* ;
. * Create Differenced Variables ;
. * ******************************************************* ;
. * Determine obs. where one or more model variables are missing;
. egen nmiss = rmiss(nrtrgain charter nschools chgschl student instid) ;
. * Create student group means;
. bysort student:egen nrtrgain_m = mean(nrtrgain) if nmiss==0;
. bysort student:egen chgschl_m = mean(chgschl) if nmiss==0;
. bysort student:egen t2001_m = mean(t2001) if nmiss==0;
. * Create school group means;
. bysort instid:egen nrtrgain_n = mean(nrtrgain) if nmiss==0;
. bysort instid:egen chgschl_n = mean(chgschl) if nmiss==0;
. bysort instid:egen t2001_n = mean(t2001) if nmiss==0;
. *Create overall mean;
. egen nrtrgain_m2 = mean(nrtrgain) if nmiss==0;
. egen chgschl_m2 = mean(chgschl) if nmiss==0;
. egen t2001_m2 = mean(t2001) if nmiss==0;
. * demean all variables (including time dummies);
. * with respect to student means and school means ;
. * and run reg;
. reg de_stdsch_nrtrgain de_stdsch_chgschl de_stdsch_t2001
if nmiss==0;
. * demean all variables (including time dummies);
. * with respect to school means (but don't add in overall mean) ;
. * and run areg;
. areg de_sch2_nrtrgain de_sch2_chgschl de_sch2_t2001
if nmiss==0, absorb(student);
Tim R. Sass
Professor Voice: (850)644-7087
Department of Economics Fax: (850)644-4535
Florida State University E-mail: [email protected]
Tallahassee, FL 32306-2180 Internet: http://garnet.acns.fsu.edu/~tsass