# Re: st: Interpretation of Dummy Variable Coefficients under Weighted Least-Squares

 From "Clive Nicholas" To statalist@hsphsun2.harvard.edu Subject Re: st: Interpretation of Dummy Variable Coefficients under Weighted Least-Squares Date Tue, 22 Nov 2005 02:31:40 -0000 (GMT)

Jonathan DePeri wrote:

> I am running a regression using sample weights on a model which
> involves binary independent variables. The question may be
> elementary, but how should I interpret the parameter estimate of
> such a variable? Since the application of sampel weights transforms
> the 0s and 1s of binary variables into 0s and values between 0 and
> 1, it is not clear to me that the interpretation should remain the
> same.

It's not clear from your post whether or not you're using

. reg y x1 x2 d1 [pw=weightvar] (or [aw=weightvar])

If you are, then it's more accurate to say that you are using weighted
_ordinary_ least squares (WOLS), rather than WLS, which is conceptually
(and statistically) different (see Winship and Radbill [1994: 241] for
more). In any case, I think your intuition is correct: the interpretation
_is_ the same. Others may want to put this is a more formal statistical
language, but fitting a regression model with sampling weights simply
adjusts the parameter estimates (and their standard errors) upwards or
downwards for _all_ the independent variables in that model, and not just
the continuous ones, in an effort to reduce the bias.

Fitting two models using the 'Garrett and Mitchell' dataset (available on
request) demonstrates this, in which I create -jobless- as a dummy
variable from a continuous variable recording the unemployment rate. The
variable -europe- is also a dummy:

. g jobless=1 if unem<=5
(293 missing values generated)

. recode jobless .=0

. g weight=invnorm(uniform())

. reg spend trade jobless growthpc europe

Source |       SS       df       MS              Number of obs =     571
----------+------------------------------           F(  4,   566) =  125.92
Model |  32430.9725     4  8107.74312           Prob > F      =  0.0000
Residual |  36444.4887   566  64.3895561           R-squared     =  0.4709
Total |  68875.4612   570  120.834142           Root MSE      =  8.0243
---------------------------------------------------------------------------
spend |      Coef.   Std. Err.      t    P>|t|     [95% Conf.Interval]
----------+----------------------------------------------------------------
trade |   .1298481   .0147212     8.82   0.000     .1009332    .1587629
jobless |  -5.381454   .7148419    -7.53   0.000    -6.785521   -3.977387
growthpc |  -1.165842   .1431781    -8.14   0.000    -1.447067   -.8846165
europe |   6.994002   .9683677     7.22   0.000     5.091969    8.896035
_cons |   34.88938   .9811031    35.56   0.000     32.96233    36.81642
---------------------------------------------------------------------------

. reg spend trade jobless growthpc europe [pw=weight]
(sum of wgt is   2.2378e+02)

Linear regression                                   Number of obs =     282
F(  4,   277) =   94.57
Prob > F      =  0.0000
R-squared     =  0.4746
Root MSE      =  7.3868
---------------------------------------------------------------------------
|               Robust
spend |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------+----------------------------------------------------------------
trade |   .1277293   .0168331     7.59   0.000     .0945923    .1608663
jobless |  -5.427786   1.377935    -3.94   0.000    -8.140341   -2.715231
growthpc |   -1.13484   .2256215    -5.03   0.000    -1.578991   -.6906894
europe |   6.744966    1.26138     5.35   0.000     4.261858    9.228074
_cons |   35.44936   .9949323    35.63   0.000     33.49077    37.40795
---------------------------------------------------------------------------

The weight I generated is, of course, nonsensical since this is a panel
dataset of eighteen countries, but it illustrates my point. The standard
errors change dramatically, but note how the parameter estimates don't
change all that much (including the dummy variable, which shows that,
although it's still significant post-weighting, its p-value 'purchase' is
weaker: indeed, that's the story for all of the variables in this
example). They've all simply been adjusted slightly after weighting.

fitted an OLS 'between-effects' model (-xtreg, be-) both with and without
the -wls- option, you would find exactly the same thing.

CLIVE NICHOLAS        |t: 0(044)7903 397793
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps

Reference:

Winship C and Radbill L (1994) "Sampling Weights and Regression Analysis",
SOCIOLOGICAL METHODS AND RESEARCH 23(2): 230-57.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/