Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Interpretation of Dummy Variable Coefficients under Weighted Least-Squares


From   "Clive Nicholas" <Clive.Nicholas@newcastle.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Interpretation of Dummy Variable Coefficients under Weighted Least-Squares
Date   Tue, 22 Nov 2005 02:31:40 -0000 (GMT)

Jonathan DePeri wrote:

> I am running a regression using sample weights on a model which
> involves binary independent variables. The question may be
> elementary, but how should I interpret the parameter estimate of
> such a variable? Since the application of sampel weights transforms
> the 0s and 1s of binary variables into 0s and values between 0 and
> 1, it is not clear to me that the interpretation should remain the
> same.

It's not clear from your post whether or not you're using

. reg y x1 x2 d1 [pw=weightvar] (or [aw=weightvar])

If you are, then it's more accurate to say that you are using weighted
_ordinary_ least squares (WOLS), rather than WLS, which is conceptually
(and statistically) different (see Winship and Radbill [1994: 241] for
more). In any case, I think your intuition is correct: the interpretation
_is_ the same. Others may want to put this is a more formal statistical
language, but fitting a regression model with sampling weights simply
adjusts the parameter estimates (and their standard errors) upwards or
downwards for _all_ the independent variables in that model, and not just
the continuous ones, in an effort to reduce the bias.

Fitting two models using the 'Garrett and Mitchell' dataset (available on
request) demonstrates this, in which I create -jobless- as a dummy
variable from a continuous variable recording the unemployment rate. The
variable -europe- is also a dummy:

. g jobless=1 if unem<=5
(293 missing values generated)

. recode jobless .=0
(jobless: 293 changes made)

. g weight=invnorm(uniform())

. reg spend trade jobless growthpc europe

   Source |       SS       df       MS              Number of obs =     571
----------+------------------------------           F(  4,   566) =  125.92
    Model |  32430.9725     4  8107.74312           Prob > F      =  0.0000
 Residual |  36444.4887   566  64.3895561           R-squared     =  0.4709
----------+------------------------------           Adj R-squared =  0.4671
    Total |  68875.4612   570  120.834142           Root MSE      =  8.0243
---------------------------------------------------------------------------
    spend |      Coef.   Std. Err.      t    P>|t|     [95% Conf.Interval]
----------+----------------------------------------------------------------
    trade |   .1298481   .0147212     8.82   0.000     .1009332    .1587629
  jobless |  -5.381454   .7148419    -7.53   0.000    -6.785521   -3.977387
 growthpc |  -1.165842   .1431781    -8.14   0.000    -1.447067   -.8846165
   europe |   6.994002   .9683677     7.22   0.000     5.091969    8.896035
    _cons |   34.88938   .9811031    35.56   0.000     32.96233    36.81642
---------------------------------------------------------------------------

. reg spend trade jobless growthpc europe [pw=weight]
(sum of wgt is   2.2378e+02)

Linear regression                                   Number of obs =     282
                                                    F(  4,   277) =   94.57
                                                    Prob > F      =  0.0000
                                                    R-squared     =  0.4746
                                                    Root MSE      =  7.3868
---------------------------------------------------------------------------
          |               Robust
    spend |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------+----------------------------------------------------------------
    trade |   .1277293   .0168331     7.59   0.000     .0945923    .1608663
  jobless |  -5.427786   1.377935    -3.94   0.000    -8.140341   -2.715231
 growthpc |   -1.13484   .2256215    -5.03   0.000    -1.578991   -.6906894
   europe |   6.744966    1.26138     5.35   0.000     4.261858    9.228074
    _cons |   35.44936   .9949323    35.63   0.000     33.49077    37.40795
---------------------------------------------------------------------------

The weight I generated is, of course, nonsensical since this is a panel
dataset of eighteen countries, but it illustrates my point. The standard
errors change dramatically, but note how the parameter estimates don't
change all that much (including the dummy variable, which shows that,
although it's still significant post-weighting, its p-value 'purchase' is
weaker: indeed, that's the story for all of the variables in this
example). They've all simply been adjusted slightly after weighting.

If your post was really asking about WLS, then you would find that if you
fitted an OLS 'between-effects' model (-xtreg, be-) both with and without
the -wls- option, you would find exactly the same thing.

CLIVE NICHOLAS        |t: 0(044)7903 397793
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps

Reference:

Winship C and Radbill L (1994) "Sampling Weights and Regression Analysis",
SOCIOLOGICAL METHODS AND RESEARCH 23(2): 230-57.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index