st: -oaxaca- problem

From   "Rubil Ivica" <[email protected]>
To   <[email protected]>
Date   Thu, 28 Nov 2013 14:30:47 +0100

Dear Statalisters,

I have a problem with Ben Jann's -oaxaca-.

I run:

oaxaca lwage $exp_var_2011 [aw=w], by(pri) weight(1) nodetail

And the output below suggests the mean of lwage for the group1 (pri==0) is 3.468472, while for the group2 (pri==1) the mean is 3.296296 

Model for group 1
(sum of wgt is   9.1834e+06)

      Source |       SS       df       MS              Number of obs =    1836
-------------+------------------------------           F( 32,  1803) =   80.76
       Model |  129.198455    32  4.03745173           Prob > F      =  0.0000
    Residual |  90.1351807  1803  .049991781           R-squared     =  0.5890
-------------+------------------------------           Adj R-squared =  0.5818
       Total |  219.333636  1835  .119527867           Root MSE      =  .22359

       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
         age |  -.0200541   .0071478    -2.81   0.005     -.034073   -.0060351
        age2 |   .0195698   .0077357     2.53   0.011      .004398    .0347417
      female |  -.1260799   .0122342   -10.31   0.000    -.1500746   -.1020852
     married |   .0374127   .0130586     2.86   0.004      .011801    .0630244
    educ_low |   -.056286   .0237125    -2.37   0.018    -.1027928   -.0097791
   educ_high |   .1128072   .0197363     5.72   0.000     .0740987    .1515156
   educ_mrdr |   .3645071   .0339375    10.74   0.000     .2979462     .431068
       exper |   .0150992   .0040778     3.70   0.000     .0071016    .0230968
      exper2 |    -.02621   .0084305    -3.11   0.002    -.0427447   -.0096754
      tenure |   .0043512    .002352     1.85   0.064    -.0002618    .0089642
     tenure2 |  -.0000641   .0000568    -1.13   0.259    -.0001755    .0000473
   supervise |   .1118821   .0155912     7.18   0.000     .0813034    .1424608
       urban |   .0380235   .0120195     3.16   0.002     .0144498    .0615971
        occ1 |   .3100307    .038864     7.98   0.000     .2338074     .386254
        occ2 |   .1805049   .0196895     9.17   0.000     .1418882    .2191215
        occ4 |  -.0814861   .0203193    -4.01   0.000    -.1213381   -.0416342
        occ5 |  -.1697702   .0226988    -7.48   0.000     -.214289   -.1252515
        occ6 |  -.1982014   .0561438    -3.53   0.000     -.308315   -.0880877
        occ7 |  -.1408072   .0258736    -5.44   0.000    -.1915525   -.0900618
        occ8 |  -.1192174   .0261573    -4.56   0.000    -.1705192   -.0679156
        occ9 |   -.323215   .0286932   -11.26   0.000    -.3794905   -.2669395
   cont_temp |  -.1330346   .0280786    -4.74   0.000    -.1881046   -.0779646
  mediumfirm |   .0141947   .0133518     1.06   0.288    -.0119919    .0403813
   largefirm |   .0651125   .0137416     4.74   0.000     .0381613    .0920638
   act3_2011 |  -.0176042   .0229491    -0.77   0.443    -.0626139    .0274055
   act6_2011 |  -.0070403   .0373465    -0.19   0.850    -.0802873    .0662067
   act7_2011 |   .0268201   .0499898     0.54   0.592    -.0712239     .124864
   act8_2011 |   .0378372   .0193573     1.95   0.051    -.0001278    .0758023
  act11_2011 |   .0838373   .0412563     2.03   0.042     .0029222    .1647524
   northwest |  -.0053293   .0188184    -0.28   0.777    -.0422374    .0315789
 centraleast |  -.0452847    .017326    -2.61   0.009    -.0792658   -.0113036
    adriatic |   -.033017   .0168158    -1.96   0.050    -.0659974   -.0000366
       _cons |   3.666189   .1340373    27.35   0.000     3.403304    3.929073

Model for group 2
(sum of wgt is   1.0569e+07)

      Source |       SS       df       MS              Number of obs =    2611
-------------+------------------------------           F( 32,  2578) =  104.00
       Model |  401.188724    32  12.5371476           Prob > F      =  0.0000
    Residual |   310.77096  2578  .120547308           R-squared     =  0.5635
-------------+------------------------------           Adj R-squared =  0.5581
       Total |  711.959684  2610  .272781488           Root MSE      =   .3472

       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
         age |   .0107747   .0074768     1.44   0.150    -.0038865    .0254359
        age2 |  -.0144506   .0091347    -1.58   0.114    -.0323627    .0034616
      female |  -.1835176   .0160742   -11.42   0.000    -.2150373   -.1519979
     married |   .0333863   .0174478     1.91   0.056    -.0008269    .0675994
    educ_low |  -.0968793   .0258114    -3.75   0.000    -.1474925   -.0462661
   educ_high |   .2377385   .0258013     9.21   0.000     .1871451    .2883319
   educ_mrdr |     .22368   .0745542     3.00   0.003     .0774877    .3698722
       exper |  -.0066592   .0041738    -1.60   0.111    -.0148436    .0015252
      exper2 |   .0313363   .0100652     3.11   0.002     .0115996     .051073
      tenure |   .0062553    .002891     2.16   0.031     .0005863    .0119242
     tenure2 |  -.0001353   .0000808    -1.68   0.094    -.0002937    .0000231
   supervise |   .1942685   .0211429     9.19   0.000     .1528097    .2357273
       urban |   .0106154    .015441     0.69   0.492    -.0196627    .0408935
        occ1 |   .3826859    .047162     8.11   0.000     .2902067    .4751651
        occ2 |   .0739356    .031726     2.33   0.020     .0117246    .1361465
        occ4 |  -.1942338   .0286501    -6.78   0.000    -.2504134   -.1380542
        occ5 |  -.3055466   .0269873   -11.32   0.000    -.3584657   -.2526276
        occ6 |  -.3949566   .1190838    -3.32   0.001    -.6284662   -.1614469
        occ7 |   -.200156   .0276926    -7.23   0.000    -.2544581    -.145854
        occ8 |  -.3148345   .0281448   -11.19   0.000    -.3700233   -.2596457
        occ9 |  -.3004927   .0349444    -8.60   0.000    -.3690147   -.2319707
   cont_temp |   .0053779   .0237008     0.23   0.821    -.0410967    .0518525
  mediumfirm |  -.0285624   .0184601    -1.55   0.122    -.0647605    .0076357
   largefirm |   .0670266   .0181104     3.70   0.000     .0315143     .102539
   act3_2011 |  -.0501554   .0207554    -2.42   0.016    -.0908543   -.0094565
   act6_2011 |   .0325907   .0276224     1.18   0.238    -.0215735     .086755
   act7_2011 |   -.049103   .0215923    -2.27   0.023     -.091443    -.006763
   act8_2011 |   .3988161   .0325846    12.24   0.000     .3349214    .4627107
  act11_2011 |   .1960192   .0360301     5.44   0.000     .1253683      .26667
   northwest |  -.1211181   .0244692    -4.95   0.000    -.1690994   -.0731368
 centraleast |   -.216864   .0238503    -9.09   0.000    -.2636316   -.1700963
    adriatic |  -.0748019   .0218918    -3.42   0.001    -.1177293   -.0318746
       _cons |   3.261302   .1328525    24.55   0.000     3.000794     3.52181

Blinder-Oaxaca decomposition                      Number of obs   =       4447
                                                  Model           =     linear
Group 1: pri = 0                                  N of obs 1      =       1836
Group 2: pri = 1                                  N of obs 2      =       2611

       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
overall      |
     group_1 |   3.468472    .008098   428.31   0.000       3.4526    3.484344
     group_2 |   3.296296   .0102489   321.62   0.000     3.276209    3.316384
  difference |   .1721757   .0130621    13.18   0.000     .1465746    .1977769
   explained |   .1374541   .0146018     9.41   0.000     .1088351     .166073
 unexplained |   .0347217     .01552     2.24   0.025      .004303    .0651404

Here's the puzzle. The means for groups 1 and 2 from the above output table must be equal to the means obtained by running:

bysort pri: sum lwage

Yet, they unfortunately do not correspond, as the following output suggests:

-> pri = 0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
       lwage |      1836    3.352651    .3396782   1.809842   4.643055

-> pri = 1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
       lwage |      2611    3.088002    .4094408   1.992164   5.142046

As you can see, the same observations are used for estimating the wage regressions within -oaxaca- and for the computation of the means using
bysort pri: sum lwage

Is there anybody who knows where the problem comes from?



