Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Dummy Variable Trap, urgent


From   Maarten Buis <[email protected]>
To   [email protected]
Subject   Re: st: Dummy Variable Trap, urgent
Date   Fri, 6 Sep 2013 15:38:11 +0200

Your units appear to be counties, so it is no surprise that the region
is constant. With a fixed effects you filter out anything that is
constant within the unit, regardless of whether it is observed or not.
This is why fixed effects models are so popular. But it also means
that you cannot estimate (at least not without resorting to some
tricks) the effects of variables that are fixed within units, as you
noticed. If you added region because you wanted to adjust your
estimates for it, but are not substantively interested in it, then you
can just leave those variables out and let the fixed effects take care
of the adjusting (as it is already doing automatically). If you are
substantively interested in these region effects you will need to do
something else.

-- Maarten

Ps. I realize that your plea for urgency is sincere, but I would
strongly advise against it. To quote the Statalist FAQ:
"Urgency is only your concern. Pleas of urgency, desperation, and the
like are widely deprecated by Statalist members. What is urgent for
you is unlikely to translate into urgency for other members of the
list. It is simplest and best to just ask your question directly."

On Fri, Sep 6, 2013 at 3:27 PM, Salikhov, Talgat
<[email protected]> wrote:
> Dear All,
>
> I need some help with my model. This is for my dissertation, which is due very soon, so I would greatly appreciate if anyone could reply asap.
>
> Context:
>
> I have a panel data. I am using STATA 11. I am running an employment model with fixed effects. I have a number of various variables to control for various factors and area characteristics, including 6 categorical dummy variables to control for the area type according to the level of urbanization. I also introduced year 7 dummies.
>
> Problem:
>
> When I run the model with fixed effects specification the coefficients for area type dummies get omitted because of collinearity. I realise this is a dummy variable trap. Note that coefficients for year dummies are estimated withoutany problems (with one year omitted as expected). However even though I drop one of the area type dummy variables, it still shows as omitted. I don't know what is the problem. I tried to check the data set for potential collinearity with other variables (possible 'doubling' of fixed effects) and was deleting one variable by one from the model, but did not help.
>
> The list of my commands with the results is as follows:
>
>  clear
>
> . *(26 variables, 1050 observations pasted into data editor)
>
> . summarize output
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>       output |      1050    6636.304    5337.696       1497   33800.75
>
> . sum
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>      country |         0
>         year |      1050        2007    2.000953       2004       2010
>       region |         0
> uacountyname |         0
>      tot_emp |      1050    150599.6      117570      12800     632963
> -------------+--------------------------------------------------------
> priv_tot_emp |      1050    120319.6    96998.76      10000     541053
> road_density |      1050    6.399432    4.106369   .1963984   18.20378
>       output |      1050    6636.304    5337.696       1497   33800.75
>  propertytax |      1050    1068.173    197.2193     552.77    1782.42
>    expen_edu |      1050    28032.39    23728.64          0     207409
> -------------+--------------------------------------------------------
>    expen_pss |      1050    2208.517    2677.185          0      30889
> expen_transp |      1050    16999.31    15949.01         23     132291
> expen_hous~g |      1050    23994.46    38973.29          0     612599
> expen_libc~r |      1050    2840.263    4349.239        -30      54474
> unemployment |      1050    6.361048    2.550556        1.2       16.3
> -------------+--------------------------------------------------------
>         nvq3 |      1050    47.42981     7.48426       27.6       71.9
>         nvq4 |      1050    27.90019    8.680573         12       63.6
>      under16 |      1050    64679.43    48988.63       7100     274400
>       over65 |      1050       54866    47158.07       6400     258500
> benefitcla~s |      1050    29992.23     20152.8       1230     147780
> -------------+--------------------------------------------------------
>   majorurban |      1050         .38    .4856177          0          1
>   largeurban |      1050    .1733333    .3787156          0          1
>   otherurban |      1050    .1466667    .3539419          0          1
> significan~l |      1050    .1466667    .3539419          0          1
>      rural50 |      1050    .1266667    .3327577          0          1
> -------------+--------------------------------------------------------
>      rural80 |      1050    .0266667    .1611841          0          1
>
> . replace expen_edu = 1 if (expen_edu == 0)
> (20 real changes made)
>
> . replace expen_pss = 1 if (expen_pss == 0)
> (20 real changes made)
>
> . replace expen_transp = 1 if (expen_transp == 0)
> (0 real changes made)
>
> . replace expen_housing = 1 if (expen_housing == 0)
> (187 real changes made)
>
> . replace expen_libculher = 1 if (expen_libculher == 0)
> (19 real changes made)
>
> . tabulate year, gen(y)
>
>        year |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>        2004 |        150       14.29       14.29
>        2005 |        150       14.29       28.57
>        2006 |        150       14.29       42.86
>        2007 |        150       14.29       57.14
>        2008 |        150       14.29       71.43
>        2009 |        150       14.29       85.71
>        2010 |        150       14.29      100.00
> ------------+-----------------------------------
>       Total |      1,050      100.00
>
> . gen log_priv_tot_emp = ln(priv_tot_emp)
>
> . gen log_road_density = ln(road_density)
>
> . gen log_output = ln(output)
>
> . gen log_propertytax = ln(propertytax)
>
> . gen log_expen_edu = ln(expen_edu)
>
> . gen log_expen_pss = ln(expen_pss)
>
> . gen log_expen_transp = ln(expen_transp)
>
> . gen log_expen_housing = ln(expen_housing)
>
> . gen log_expen_libculher = ln(expen_libculher)
> (1 missing value generated)
>
> . replace log_expen_libculher = 0 if (log_expen_libculher == .)
> (1 real change made)
>
> . gen log_under16 = ln(under16)
>
> . gen log_over65 = ln(over65)
>
> . gen log_benefitclaimants = ln(benefitclaimants)
>
> . bysort  uacountyname : gen county_id = _n == 1
>
> . replace county_id = sum(county_id)
> (1049 real changes made)
>
> . xtset county_id year, yearly
>        panel variable:  county_id (strongly balanced)
>         time variable:  year, 2004 to 2010
>                 delta:  1 year
>
> . xtreg log_priv_tot_emp log_road_density log_output log_propertytax log_expen_edu log_expen_pss log_expen_transp log_exp
>> en_housing log_expen_libculher unemployment nvq3 nvq4 log_under16 log_over65 log_benefitclaimants majorurban largeurban
>>  otherurban significantrural rural50 rural80 y1 y2 y3 y4 y5 y6 y7, fe vce(robust)
> note: majorurban omitted because of collinearity
> note: largeurban omitted because of collinearity
> note: otherurban omitted because of collinearity
> note: significantrural omitted because of collinearity
> note: rural50 omitted because of collinearity
> note: rural80 omitted because of collinearity
> note: y1 omitted because of collinearity
>
> Fixed-effects (within) regression               Number of obs      =      1050
> Group variable: county_id                       Number of groups   =       150
>
> R-sq:  within  = 0.3287                         Obs per group: min =         7
>        between = 0.7905                                        avg =       7.0
>        overall = 0.7888                                        max =         7
>
>                                                 F(20,149)          =     11.83
> corr(u_i, Xb)  = 0.5373                         Prob > F           =    0.0000
>
>                             (Std. Err. adjusted for 150 clusters in county_id)
> ------------------------------------------------------------------------------
>              |               Robust
> log_priv_t~p |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> log_road_d~y |  -.0308976   .0190012    -1.63   0.106    -.0684442     .006649
>   log_output |   .2778278   .0626751     4.43   0.000     .1539811    .4016746
> log_proper~x |  -.2212403   .1382738    -1.60   0.112    -.4944711    .0519905
> log_expen_~u |   .0002099   .0024071     0.09   0.931    -.0045467    .0049664
> log_expen_~s |  -.0005283    .002037    -0.26   0.796    -.0045535    .0034969
> log_expen_~p |  -.0037324   .0038912    -0.96   0.339    -.0114214    .0039566
> log_expen_~g |   .0048323   .0014797     3.27   0.001     .0019083    .0077563
> log_expen_~r |  -.0013391   .0009824    -1.36   0.175    -.0032803    .0006021
> unemployment |   -.003091   .0012582    -2.46   0.015    -.0055772   -.0006049
>         nvq3 |   .0002088   .0009244     0.23   0.822    -.0016178    .0020353
>         nvq4 |   .0006803   .0011502     0.59   0.555    -.0015925     .002953
>  log_under16 |   .0060861   .0982632     0.06   0.951    -.1880832    .2002555
>   log_over65 |   .4174694   .0960725     4.35   0.000     .2276289    .6073099
> log_benefi~s |  -.0309202   .0774981    -0.40   0.690    -.1840575    .1222171
>   majorurban |  (omitted)
>   largeurban |  (omitted)
>   otherurban |  (omitted)
> significan~l |  (omitted)
>      rural50 |  (omitted)
>      rural80 |  (omitted)
>           y1 |  (omitted)
>           y2 |   .0067527   .0071768     0.94   0.348    -.0074287    .0209341
>           y3 |    .012156   .0135523     0.90   0.371    -.0146236    .0389355
>           y4 |   .0122614   .0202253     0.61   0.545    -.0277041    .0522268
>           y5 |   .0143446   .0249928     0.57   0.567    -.0350415    .0637307
>           y6 |   .0523563    .028217     1.86   0.066    -.0034009    .1081135
>           y7 |   .0167271   .0312961     0.53   0.594    -.0451144    .0785686
>        _cons |   6.445186   1.679938     3.84   0.000     3.125607    9.764764
> -------------+----------------------------------------------------------------
>      sigma_u |  .36708293
>      sigma_e |  .03455236
>          rho |  .99121795   (fraction of variance due to u_i)
> ------------------------------------------------------------------------------
> Sincerely,
> Talgat
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index