Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Fixed effects with reference category vs. average fixed effects (2nd try)


From   Constantine Daskalakis <c_daskalakis@entwhistle.jci.tju.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Fixed effects with reference category vs. average fixed effects (2nd try)
Date   Sun, 06 Jul 2003 19:41:24 -0400

At 06:18 PM 7/6/03, Neumayer,E wrote:
Dear all,

could I try again as I received no answer first time around? This time I also attach a log to make things clearer. How come the estimated fixed effects are often statistically significant if they express a difference to the reference category (it does not really matter which one), but are all highly insignificant if expressed as the difference to the average effect (see below). Any help still highly welcome! Best, Eric Neumayer
Because the way you test the difference from the "average" effect is incorrect.

. * Difference from average effect (no fixed effects significant)
. capture tab destination, gen(cdum)

. quietly xi: reg dyadasylumcorrpcshare l52dyadasylumcorrpcshare destinationrwp
> vote destinationleftgs dyadcolony dyadlanguage dyaddistance destinationunempl
> oyment lndestinationgdp destinationgdpgrowth destinationrecognition schenge
> n cdum1-cdum17 if inc_highoecd==0, nocons robust cluster(originid)

. capture drop averageasylum

. ge averageasylum = (_b[cdum1]+_b[cdum2]+_b[cdum3]+_b[cdum4]+_b[cdum5]+_b[cdum
> 6]+_b[cdum7]+_b[cdum8]+_b[cdum9]+_b[cdum10]+_b[cdum11]+_b[cdum12]+_b[cdum13]+
> _b[cdum14]+_b[cdum15]+_b[cdum16]+_b[cdum17])/17
Here is where the problem with your approach starts. You compute the average from the regression. Then, you simply subtract it from the actual observations (Ys) and rerun the regression. But that treats the subtracted quantity (average) as KNOWN, when it is ESTIMATED from the same data.


. capture drop dyadasylumcorrpcshare2

. ge dyadasylumcorrpcshare2=dyadasylumcorrpcshare-averageasylum
(41572 missing values generated)
This is the model that you fit to test deviations from the average:

. xi: reg dyadasylumcorrpcshare2 l52dyadasylumcorrpcshare destinationrwpvote de
> stinationleftgs dyadcolony dyadlanguage dyaddistance destinationunemployment
> lndestinationgdp destinationgdpgrowth destinationrecognition schengen cdum
> 1-cdum17 if inc_highoecd==0, nocons robust cluster(originid)
[snip]

cdum1 | .0082041 .1258245 0.07 0.948 -.2408377 .257246
...
cdum17 | -.031076 .1273527 -0.24 0.808 -.2831427 .2209907
------------------------------------------------------------------------------
What does this do? The p-value for CDUM1 (0.948) tests whether

cdum1 = 0

(and you'd expect this to be zero if CDUM1 mean is exactly equal to the average of all group means, which you subtracted from your data).

But the way you've set this up, the test uses only the observations in the CDUM1 group, cause you've told the computer that the "average" is just an arbitrary (fixed) value. The correct test, on the other hand, should be using all observations in the sample, since the average is estimated on the basis of all observations.

You should not be subtracting the average from the original data. Instead, you need to fit the model

. xi: reg dyadasylumcorrpcshare l52dyadasylumcorrpcshare destinationrwpvote des
> tinationleftgs dyadcolony dyadlanguage dyaddistance destinationunemployment
> lndestinationgdp destinationgdpgrowth destinationrecognition schengen cdum1
> -cdum17 if inc_highoecd==0, nocons robust cluster(originid)

This will give you the 17 estimated betas (but no constant) and is equivalent to the "referent category" model that you have already fit.

Now, you want to test

cdum1 = (cdum1+...+cdum17)/17
cdum2 = (cdum1+...+cdum17)/17 etc

This can be accomplished by a series of -lincom- commands:

. lincom cdum1-(cdum1+cdum2+...+cdum17)/17
. lincom cdum2-(cdum1+cdum2+...+cdum17)/17

etc

The p-values you will get will be based on the data from all groups (since each contrast correctly involves all 17 groups).


PS There is a multiple-comparison issue here, whether you do pairwise comparisons between groups, or deviations from the estimated average. You should be considering an appropriate multiple-comparison procedure to adjust your p-values.





The documents accompanying this transmission may contain confidential health or business information. This information is intended for the use of the individual or entity named above. If you have received this information in error, please notify the sender immediately and arrange for the return or destruction of these documents.

--- NOTE NEW ADDRESS AS OF JULY 8, 2003 ---
________________________________________________________________

Constantine Daskalakis, ScD
Assistant Professor,
Biostatistics Section, Thomas Jefferson University,
211 S. 9th St. #602, Philadelphia, PA 19107
Tel: 215-955-5695
Fax: 215-503-3804
Email: c_daskalakis@mail.jci.tju.edu
Webpage: http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index