[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Daniel Simon <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Cross-Sectional Time Series |

Date |
Tue, 16 Jul 2002 15:54:58 -0400 |

Hi,

I have three questions pertaining to the discussion from three weeks ago about panel-data models, and in particular, about -regress, cluster(id).

David Drukker explained that (1) -xtreg, re- and -xtreg, fe- produce more efficient estimates than -regress, cluster(id)- and (2) If the covariates are correlated with the id level error component, u_i, then only -xtreg, fe- produces consistent estimates.

In light of these conditions, when is -regress, cluster(id)- a recommended estimation strategy?

I have a model in which a Hausman test fails to reject the random-effects model. Yet, the coefficient estimates I obtain from the -xtreg, re - are quite different than those that I obtain using -regress, cluster(id). Can anyone suggest why this might occur or what it indicates?

Finally, when I run -regress, cluster(id) - the results do not yield an F-statistic. What does this indicate?

I include the results from the -xtreg, re- and the -regress, cluster(id) models below.

Thanks. Daniel

. xi:reg lnprice herf pgcirc1 markets entry i.year i.group, cluster(mag1) ,

i.year _Iyear_1988-2001 (naturally coded; _Iyear_1988 omitted)

i.group _Igroup_1-53 (naturally coded; _Igroup_1 omitted)

Regression with robust standard errors Number of obs = 4174

F( 51, 541) = .

Prob > F = .

R-squared = 0.1990

Number of clusters (mag1) = 542 Root MSE = .43841

Robust

lnprice Coef. Std. Err. t P>t [95% Conf. Interval]

herf .129881 .1397176 0.93 0.353 -.1445744 .4043365

pgcirc1 -.0000367 .0000131 -2.80 0.005 -.0000625 -.0000109

markets -.0002362 .0035526 -0.07 0.947 -.0072148 .0067425

entry -.007691 .0235849 -0.33 0.744 -.0540203 .0386383

(I cut out the coefficients on a long list of dummies)

_cons .9513429 .0748093 12.72 0.000 .8043905 1.098295

. xi:xtreg lnprice herf pgcirc1 markets entry i.year i.group, re i(mag1) , if

> year>=1990 & newchg~=1 & group~=29 & group~=28 & group~=22 & price~=0 & cove

> r~=0 & issues>3

i.year _Iyear_1988-2001 (naturally coded; _Iyear_1988 omitted)

i.group _Igroup_1-53 (naturally coded; _Igroup_1 omitted)

Random-effects GLS regression Number of obs = 4174

Group variable (i) : mag1 Number of groups = 542

R-sq: within = 0.0456 Obs per group: min = 1

between = 0.2198 avg = 7.7

overall = 0.1812 max = 11

Random effects u_i ~ Gaussian Wald chi2(54) = 318.26

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

lnprice Coef. Std. Err. z P>z [95% Conf. Interval]

herf -.0030196 .0569924 -0.05 0.958 -.1147227 .1086835

pgcirc1 -.0000211 3.93e-06 -5.37 0.000 -.0000288 -.0000134

markets .0009128 .0010585 0.86 0.389 -.0011619 .0029874

entry -.0252725 .0110294 -2.29 0.022 -.0468898 -.0036552

(I cut out the coefficients on a long list of dummies)

_cons 1.127733 .1519215 7.42 0.000 .8299721 1.425493

sigma_u .44605988

sigma_e .15335763

rho .89429288 (fraction of variance due to u_i)

.

end of do-file

At 10:31 AM 6/26/2002 -0500, you wrote:

John Neumann <[email protected]> began an interesting thread on this list

yesterday when he asked whether he should use -reg ,cluster(id)-, -xtreg ,

re- or -xtreg, fe- for estimation and inference about his panel data model.

Then anirban basu <[email protected]> and Mark Schaffer

<[email protected]> both provided interesting responses to John's

original question.

Both Anirban and Mark have pointed out that -regress , cluster(id)- will

provide consistent estimates of the coefficients. There is also agreement

that -xtreg, re- will provide consistent estimates of the coefficients. But

there seems to some discussion of whether -xtreg, fe- will provide consistent

estimates.

In theory, all three estimators (-regress ,cluster(id)-, -xtreg,re- ,

-xtreg, fe-) are consistent estimators of the coefficients for

random-effects data generating processes. To flush the details, consider

the random-effects data generating process

y_it = X_it b + u_i + e_it

where X_it is a 1 x K vector of covariates, b is K x 1 vector of coefficients,

u_i is identically, independently distributed (iid) over id's,

e_it is iid over the observations, and there is no correlation between

X_it and u_i. Under these assumptions, all three estimators all provide

consistent estimators of the VCE matrix and the resulting Wald tests will

obtain nominal coverage, given enough data.

Another theoretical point is in order. For the random-effects data

generating process -xtreg, re- should provide more efficient estimates of

the coefficients than either of the other two. While -xtreg, fe- should

produce more efficient estimates than -regress, cluster(id)-. (One caveat in

this case is that inference is said to be conditional on the random-effects in

the sample.)

To illustrate these points, I have written a small simulation. The do file

is appended to the below my signature. Breifly the program

i) produces 1000 draws from a parameterization of the

random-effects data generating process

ii) runs the three estimators on each sample, saving off the

coefficients

iii) uses -test- to test that coefficients are equal to their true

values, saving off p-values

iv) then computes the coverage rates obtained by each estimator on

on each test

Let's begin looking at the results for the coefficients.

First, we need to understand the variable names. As can be seen from the

program fevclust.do, appended below,

Variable name Meaning

x1_crg -coefficient on x1 from -regress, cluster(id)-

x2_crg -coefficient on x2 from -regress, cluster(id)-

x1_cfe -coefficient on x1 from -xtreg, fe-

x2_cfe -coefficient on x2 from -xtreg, fe-

x1_cre -coefficient on x1 from -xtreg, fe-

x2_cre -coefficient on x2 from -xtreg, fe-

Now for these results. The table below presents the summary statistics from

these variables obtained over the 1000 samples that were generated.

Variable | Obs Mean Std. Dev. Min Max

-------------+-----------------------------------------------------

x1_crg | 1000 2.99791 .1070938 2.540678 3.372749

x2_crg | 1000 3.001715 .1044514 2.71244 3.303503

x1_cfe | 1000 3.000676 .051932 2.847759 3.144626

x2_cfe | 1000 3.003031 .0496754 2.845199 3.160065

x1_cre | 1000 3.000508 .0515863 2.846256 3.150228

x2_cre | 1000 3.00292 .0498531 2.839417 3.163176

There are several points to note. The mean of the estimates of each

estimator is very close to the true value of 3.0. Second the standard

deviation of the estimates from -regress, cluster(id)- is about twice the

standard deviations from the other two estimators. This indicates that

-xtreg,re- and -xtreg,fe- are more efficient that -regress, cluster(id)-.

Third, the standard deviation of the estimates from -xtreg, fe- are

surprisingly close to those of -xtreg, re-. This indicates that for this

parameterization of the data generating process and sample size, -xtreg, fe-

is as efficient an estimator as -xtreg, re-.

Now let's consider coverage. The results table below contains the means of

6 binary variables from the 1000 generated samples. In each sample, each

variable is 1 if the test in question was rejected for that sample and zero

otherwise. Thus the means in the table below can be interpreted as

emprical coverage rates.

x1_rjrg fraction of tests in which the true null that x1=3 was rejected

after -reg ,cluster(id)

x2_rjrg fraction of tests in which the true null that x2=3 was rejected

after -reg ,cluster(id)

x1_rjfe fraction of tests in which the true null that x1=3 was rejected

after -xtreg, fe-

x2_rjfe fraction of tests in which the true null that x2=3 was rejected

after -xtreg, re-

x1_rjre fraction of tests in which the true null that x1=3 was rejected

after -xtreg, re-

x2_rjre fraction of tests in which the true null that x2=3 was rejected

after -xtreg, fe-

And the results are

Variable | Obs Mean Std. Dev. Min Max

-------------+-----------------------------------------------------

x1_rjrg | 1000 .05 .218054 0 1

x2_rjrg | 1000 .07 .2552747 0 1

x1_rjfe | 1000 .054 .2261308 0 1

x2_rjfe | 1000 .039 .1936918 0 1

x1_rjre | 1000 .059 .2357426 0 1

x2_rjre | 1000 .042 .2006895 0 1

Note the all the tests are reasonable close to nominal coverage. Also note

that the tests after -xtreg, re- are marginally closer to nominal than those

after -xtreg, fe-.

There is one final point that must be made. The crutial assumption in the

above data generating process is that X_it and u_i are not correlated. If

they are correlated only -xtreg, fe- will provide consistent estimates.

Below -fevclust.do-, I have append a second program, called fe_ex.do, that

illustrates this point. -fe_ex.do- generates a single large sample from the

same structure as above, EXCEPT that there is correlation between X_it and

u_i.

Here are the crutial correlations in our sample

. corr x1 x2 ui

(obs=5000)

| x1 x2 ui

-------------+---------------------------

x1 | 1.0000

x2 | 0.6081 1.0000

ui | 0.6349 0.7170 1.0000

Since all true values of the coefficients are 3.0, the output below

illustrates, -regress, cluster(id)- is not consistent for this data

generating process.

. regress y x1 x2,cluster(id)

Regression with robust standard errors Number of obs = 5000

F( 2, 999) =34624.91

Prob > F = 0.0000

R-squared = 0.9674

Number of clusters (id) = 1000 Root MSE = 1.6459

------------------------------------------------------------------------------

| Robust

y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1 | 3.498473 .024788 141.14 0.000 3.449831 3.547116

x2 | 3.701595 .0232616 159.13 0.000 3.655948 3.747242

_cons | 2.97323 .0336344 88.40 0.000 2.907228 3.039232

------------------------------------------------------------------------------

In contrast, the output below illustrates -xtreg, fe- is a consistent

estimator for the coefficients with this data generating process.

. xtreg y x1 x2, fe i(id)

Fixed-effects (within) regression Number of obs = 5000

Group variable (i) : id Number of groups = 1000

R-sq: within = 0.9602 Obs per group: min = 5

between = 0.9885 avg = 5.0

overall = 0.9672 max = 5

F(2,3998) = 48238.88

corr(u_i, Xb) = 0.7384 Prob > F = 0.0000

------------------------------------------------------------------------------

y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1 | 2.997751 .0164867 181.83 0.000 2.965428 3.030074

x2 | 2.993408 .0157054 190.60 0.000 2.962616 3.024199

_cons | 2.942511 .0140139 209.97 0.000 2.915036 2.969986

-------------+----------------------------------------------------------------

sigma_u | 2.0660572

sigma_e | .99033698

rho | .81316439 (fraction of variance due to u_i)

------------------------------------------------------------------------------

F test that all u_i=0: F(999, 3998) = 9.81 Prob > F = 0.0000

Finally, the output below illustrates -xtreg, re- is not a consistent

estimator for the coefficients with this data generating process.

. xtreg y x1 x2, re i(id)

Random-effects GLS regression Number of obs = 5000

Group variable (i) : id Number of groups = 1000

R-sq: within = 0.9601 Obs per group: min = 5

between = 0.9885 avg = 5.0

overall = 0.9674 max = 5

Random effects u_i ~ Gaussian Wald chi2(2) = 109595.77

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------

y | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1 | 3.259649 .0197243 165.26 0.000 3.22099 3.298308

x2 | 3.363142 .0179413 187.45 0.000 3.327977 3.398306

_cons | 2.958558 .0344846 85.79 0.000 2.89097 3.026147

-------------+----------------------------------------------------------------

sigma_u | .72744824

sigma_e | .99033698

rho | .35046296 (fraction of variance due to u_i)

------------------------------------------------------------------------------

There are many other points that could be made about the results presented

above. However, I hope that this simulations help to illustrate the basic

points that

i) -regress , cluster(id)-, -xtreg, re-, and -xtreg, fe- are all

consistent estimators for the coefficients with the random-effects

data generating process.

ii) Tests performed on the coefficients after -regress ,

cluster(id)-, -xtreg, re-, and -xtreg, fe- will have close to

nominal coverage when the data was generated by a random-effects

data generating process.

iii) -xtreg, re- and -xtreg, fe- produce more efficients estimates

that -regress, cluster(id)-

iv) If the covariates are correlated with the id level error

component, u_i, then only -xtreg, fe- produces consistent estimates.

--David

[email protected]

----------------- begin fevclust.do---------------------------------------

clear

capture log close

log using fevclust.log , replace

set seed 1234567

postfile redat x1_crg x2_crg x1_prg x2_prg x1_cfe x2_cfe x1_pfe x2_pfe /*

*/ x1_cre x2_cre x1_pre x2_pre using redat, replace double

forvalues i=1/1000 {

qui {

drop _all

set obs 100

gen ui=2*invnorm(uniform())

gen id =_n

expand 5

sort id

gen x1=invnorm(uniform())

gen x2=invnorm(uniform())+.3*x1

gen eit=invnorm(uniform())

gen y=3+3*x1+3*x2+ui+eit

regress y x1 x2,cluster(id)

scalar x1_crg = _b[x1]

scalar x2_crg = _b[x2]

test x1 = 3

scalar x1_prg = r(p)

test x2 = 3

scalar x2_prg = r(p)

xtreg y x1 x2, fe i(id)

scalar x1_cfe = _b[x1]

scalar x2_cfe = _b[x2]

test x1 = 3

scalar x1_pfe = r(p)

test x2 = 3

scalar x2_pfe = r(p)

xtreg y x1 x2, re i(id)

scalar x1_cre = _b[x1]

scalar x2_cre = _b[x2]

test x1 = 3

scalar x1_pre = r(p)

test x2 = 3

scalar x2_pre = r(p)

post redat (x1_crg) (x2_crg) (x1_prg) (x2_prg) (x1_cfe) /*

*/ (x2_cfe) (x1_pfe) (x2_pfe) (x1_cre) (x2_cre) /*

*/ (x1_pre) (x2_pre)

}

}

postclose redat

use redat, clear

gen x1_rjrg=(x1_prg<.05)

gen x2_rjrg=(x2_prg<.05)

gen x1_rjfe=(x1_pfe<.05)

gen x2_rjfe=(x2_pfe<.05)

gen x1_rjre=(x1_pre<.05)

gen x2_rjre=(x2_pre<.05)

sum

save redat, replace

capture log close

----------------- end fevclust.do---------------------------------------

----------------- begin fe_ex.do---------------------------------------

clear

set seed 1234567

drop _all

set obs 1000

gen ui=2*invnorm(uniform())

gen id =_n

expand 5

sort id

gen x1=invnorm(uniform())+.4*ui

gen x2=invnorm(uniform())+.3*x1 + .4*ui

gen eit=invnorm(uniform())

gen y=3+3*x1+3*x2+ui+eit

corr x1 x2 ui

regress y x1 x2,cluster(id)

xtreg y x1 x2, fe i(id)

xtreg y x1 x2, re i(id)

----------------- end fe_ex.do---------------------------------------

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

Daniel Simon Assistant Professor Department of Applied Economics and Management Cornell University (607) 255-1626 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: overid: tests of overidentifying restrictions** - Next by Date:
**st: immediate command for oneway anova** - Previous by thread:
**st: overid: tests of overidentifying restrictions** - Next by thread:
**st: immediate command for oneway anova** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |