Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How to perform Hausman test for random effects specification with survey data


From   "James W. Shaw" <shaw@pharmacy.arizona.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: How to perform Hausman test for random effects specification with survey data
Date   Sat, 21 Aug 2004 16:34:07 -0400

Mark,

I performed Wooldridge's test as specified on p. 291 of his text.
Wooldridge's test converges on a certain set of results (F and p values)
after four of the time-demeaned coefficients are simultaneously tested.
That is, I may include up to four of the time-demeaned variables in the
artificial regression, and the test results are always the same regardless
of which four are included.  Including more than four time-demeaned
variables results in variables (either time-demeaned or quasi-demeaned)
being dropped from the regression due to multicollinearity.

With the test I developed, I directly compare the fixed effects and random
effects parameter estimates.  This is akin to the traditional version of the
Hausman test.  I am able to test for differences between the two
specifications in up to eight coefficients simultaneously.  Regardless of
which eight coefficients are tested, I get the same results.  The test I
developed yields the same results as Wooldridge's test if differences in
four of the 12 parameters being estimated are simultaneously tested but
converges on a different set of results when eight coefficients are tested.
The inference does not change, though (ie, neither the test I developed nor
Wooldridge's test rejects the null).

This is very interesting, though I am not certain why I should be able to
test more coefficients using the method I developed.  Based on the results
of Wooldridge's test, I think one explanation for why am able to test only a
subset of the 12 parameters being estimated is due to collinearity between
the quasi-demeaned variables and time-demeaned variables.  All of the
variables in my model vary both with subject and time.  The artificial
regression used to perform Wooldridge's test should include quasi-demeaned
and time-demeaned versions of each variable; however, only a subset of the
latter may be included.

I am not sure how I should discuss this in the paper.  Specifically, if it
is a multicollinearity problem, what should I say the collinearity is
between?

--
Jim




----- Original Message -----
From: "Mark Schaffer" <M.E.Schaffer@hw.ac.uk>
To: <statalist@hsphsun2.harvard.edu>; "James W. Shaw"
<shaw@pharmacy.arizona.edu>
Cc: "Mark Schaffer" <M.E.Schaffer@hw.ac.uk>
Sent: Wednesday, August 18, 2004 4:58 PM
Subject: Re: st: How to perform Hausman test for random effects
specification with survey data


> James,
>
> This isn't a direct answer to your question, but might be helpful anyway.
>
> It's possible to implement a version of the Hausman test that is robust to
> heteroskedasticity and hence (I think) clustered, probability-weighted
> data.
>
> You can do this by carrying out the artificial regression version of the
> test.  This seems particularly appropriate in your case since in step (3)
> of your estimation below, you are estimating the GLS version via a
> regression on the quasi-demeaned data.  To do the artificial regression
> version of the Hausman test, you run the same regression but include the
> time-varying regressors after you have time-demeaned them.  The Hausman
> test is just a Wald test of the significance of the coeffs on these time-
> demeaned additional regressors.
>
> The convenience of this for your application is that if you estimate this
> artificial regression using -robust- and -cluster-, you should get a
> Hausman test that is suitable for your clustered, probability weighted
> data.
>
> You can find a full description of this artificial regression test in
> Wooldridge's (2002) book, Econometric Analysis of Cross-section and Panel
> Data, pp. 290-91.  Note that when Wooldridge recommends on p. 291 that the
> Wald test is robust to serial correlation as well as heteroskedasticity,
> he is in effect recommending using -cluster- together with -robust-.
>
> Hope this helps.
>
> Cheers,
> Mark
>
> Quoting "James W. Shaw" <shaw@pharmacy.arizona.edu>:
>
> > Dear Statalisters:
> >
> > I have a question about Stata's -suest- command that I hope someone
> > may be
> > able to answer for me.  I have seen it asked by others a few times
> > before
> > over the past year without any response.
> >
> > It is my understanding that the Hausman test, which is often used
> > to
> > evaluate the consistency of the estimates from random effects
> > models, cannot
> > be used with survey (ie, clustered, probability-weighted) data.  I
> > was
> > wondering if the -suest- command could be used to implement a valid
> > version
> > of the Hausman test (for comparing random and fixed effects
> > specifications)
> > for use with survey data.  I have done so using the code given at
> > the end of
> > this message.
> >
> > Some background first.  I have data from a multistage probability
> > sample of
> > the US population (n=3773) with oversamples of blacks and Hispanics.
> >  I am
> > interested in estimating a design-consistent model allowing for a
> > respondent-level random effect.  I wish to compare the random
> > effects
> > specification against the corresponding fixed effects model using
> > the
> > Hausman test.  To estimate the random effects model, I do the
> > following:
> >
> > (1) generate weighted estimates of the variance components
> > (2) apply a GLS transform to the data
> > (3) estimate the model from the transformed data using -regress-
> >
> > According to Korn and Graubard, the above procedure may not always
> > work.  It
> > does in my case because I have a large number of sufficiently large
> > PSUs.
> > The parameter estimates and standard errors I get are equivalent to
> > those
> > derived when using SUDAAN (which estimates the corresponding
> > covariance
> > pattern model).
> >
> > To perform the Hausman test, I do the following:
> >
> > (1) I concatenate the GLS-transformed and original data using
> > -append-
> > (2) Using -regress- with the score option, I estimate the random
> > effects
> > model from the GLS-transformed data and save the estimates
> > (3) Using -regress- with the score option, I estimate the fixed
> > effects
> > model from the original data (including dummies for respondents) and
> > save
> > the estimates
> > (4) I perform the simultaneous estimation using -suest- with the svy
> > option
> > (5) I perform Hausman's test for the consistency of the random
> > effects model
> > by testing the difference between the two coefficient vectors
> > (excluding the
> > constant and fixed effects)
> >
> > The above procedure seems to work.  -suest- gives me the correct
> > parameter
> > estimates and standard errors for the two models.  However, I notice
> > that I
> > am only able to test for differences in 8 coefficients
> > simultaneously.
> > There were 12 independent variables in each model (excluding the
> > constant
> > and respondent dummies in the fixed effects specification).
> > Interestingly,
> > it does not seem to matter which 8 coefficients I test.  I always
> > get the
> > same statistical result (ie, F and p values).  My thought is that
> > this must
> > somehow be related to the fact that my data are clustered (ie, that
> > I am
> > allowing for clustering at the level of the PSU).  In other words, I
> > think
> > it may be a peculiarity of my data and that the code I present below
> > is
> > working correctly.  Does this sound plausible?
> >
> > Any feedback you could provide me with would be greatly appreciated.
> >  Thank
> > you very much.
> >
> > Regards,
> >
> > Jim
> >
> > James W. Shaw, PhD, PharmD, MPH
> > Post-Doctoral Fellow
> > Tobacco Control Research Branch
> > Behavioral Research Program
> > Division of Cancer Control and Population Sciences
> > National Cancer Institute
> >
> >
> > /* STATA CODE */
> >
> > /* GLS TRANSFORM DATA */
> >
> > collapse (mean) depvar m1-a2 d1 c3 c32 [pw = ttowgt], by(rti_id)
> > ren depvar depvar2
> > ren m1 m12
> > ren m2 m22
> > ren s1 s12
> > ren s2 s22
> > ren u1 u12
> > ren u2 u22
> > ren p1 p12
> > ren p2 p22
> > ren a1 a12
> > ren a2 a22
> > ren c3 c3n
> > ren c32 c32n
> > sort rti_id
> > save "E:\Dissertation\Data\temp1", replace
> > use "E:\Dissertation\Data\tempus.dta", clear
> > drop _merge
> > sort rti_id
> > merge rti_id using "E:\Dissertation\Data\temp1"
> >
> > xtreg depvar m1-a2 c3 c32 [iw = ttowgt], i(rti_id) mle
> >
> > gen theta = 1 - sqrt(e(sigma_e)^2/(12*e(sigma_u)^2 +
> > e(sigma_e)^2))
> > gen depvar3 = depvar - theta*depvar2
> > gen m13 = m1- theta*m12
> > gen m23 = m2 - theta*m22
> > gen s13 = s1 - theta*s12
> > gen s23 = s2 - theta*s22
> > gen u13 = u1- theta*u12
> > gen u23 = u2 - theta*u22
> > gen p13 = p1- theta*p12
> > gen p23 = p2- theta*p22
> > gen a13 = a1 - theta*a12
> > gen a23 = a2- theta*a22
> > gen c33 = c3- theta*c3n
> > gen c323 = c32- theta*c32n
> > gen one = 1
> > summ one
> > scalar omean = r(mean)
> > gen one3 = one - theta*omean
> >
> > /* SAVE TRANSFORMED DATA FOR RANDOM EFFECTS ESTIMATION */
> >
> > gen res = 1
> > sort psu rti_id time
> > save "E:\Dissertation\Data\temp1", replace
> >
> > /* RENAME RAW (UNTRANSFORMED) VARIABLES FOR FIXED EFFECTS ESTIMATION
> > */
> >
> > use "E:\Dissertation\Data\tempus.dta", clear
> > ren depvar depvar3
> > ren m1 m13
> > ren m2 m23
> > ren s1 s13
> > ren s2 s23
> > ren u1 u13
> > ren u2 u23
> > ren p1 p13
> > ren p2 p23
> > ren a1 a13
> > ren a2 a23
> > ren c3 c33
> > ren c32 c323
> > gen one3 = 1
> > gen res = 0
> >
> > /* APPEND TRANSFORMED DATA TO RAW DATA */
> >
> > sort psu rti_id time
> > append using "E:\Dissertation\Data\temp1"
> >
> > /* ESTIMATE RANDOM EFFECTS MODEL */
> >
> > svyset [pw = ttowgt], psu(psu)
> > reg depvar3 one3 m13-a23 c33 c323 if res == 1 [iw = ttowgt],
> > score(RE)
> > nocons
> > est store RE
> >
> > /* ESTIMATE FIXED EFFECTS MODEL */
> >
> > tab rti_id, gen(id)
> > reg depvar3 one3 m13-a23 c33 c323 id2-id3773 if res == 0 [iw =
> > ttowgt],
> > score(FE) nocons
> > est store FE
> >
> > /* USE -SUEST- TO PERFORM HAUSMAN TEST */
> >
> > suest RE FE, svy
> > test [RE_mean = FE_mean]: m13 m23 s13 s23 u13 u23 p13 p23 a13 a23
> > c33 c323
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
>
>
>
> Prof. Mark Schaffer
> Director, CERT
> Department of Economics
> School of Management & Languages
> Heriot-Watt University, Edinburgh EH14 4AS
> tel +44-131-451-3494 / fax +44-131-451-3008
> email: m.e.schaffer@hw.ac.uk
> web: http://www.sml.hw.ac.uk/ecomes
> ________________________________________________________________
>
> DISCLAIMER:
>
> This e-mail and any files transmitted with it are confidential
> and intended solely for the use of the individual or entity to
> whom it is addressed.  If you are not the intended recipient
> you are prohibited from using any of the information contained
> in this e-mail.  In such a case, please destroy all copies in
> your possession and notify the sender by reply e-mail.  Heriot
> Watt University does not accept liability or responsibility
> for changes made to this e-mail after it was sent, or for
> viruses transmitted through this e-mail.  Opinions, comments,
> conclusions and other information in this e-mail that do not
> relate to the official business of Heriot Watt University are
> not endorsed by it.
> ________________________________________________________________
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index