Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Question on aweights

From	Diego Anzoategui <[email protected]>
To	[email protected]
Subject	st: Question on aweights
Date	Mon, 1 Aug 2011 12:53:46 -0300

Dear all,

I have a question regarding the aweights.
I am using a dataset that has data at firm level for a number of countries.
I would like to run some regressions using firm-level data but
assigning more weight to those firms that come from countries where
fewer firms were surveyed so that all the countries are equally
represented in the regression. In my sample I have African countries
with only 70 firms surveyed and countries like China with over 1500
firms surveyed. I do not want the results to represent only those
countries with a large number of observations.
I found that aweights could be useful for this. More specifically, I
can use the inverse of the number of firms surveyed in each country as
aweights and the results are going to be similar to those obtained
taking the average by country and then running the regression without
aweights. That is,
doing:
reg y x [aweight=1/nr_firms]

shows the same coefficients as doing this:
collapse (mean) y x nr_firms, by(country)

and then,
reg y x

where: y is the dependent variable, x is the independent variable,
nr_firms is the number of firms surveyed in each country, country is a
string with the name of each country.

I have run these two regressions and the coefficients are exactly the
same, but the Standard Errors differ. In fact, in the regression using
the firm-level data and aweights the variable x is significant at 1
percent, while in the regression of the country averages the same
variable is not significant.
Obviously, the difference is in the Mean of Square Errors. Why are
they that different? I understand that they can be different, but why
is there such a big difference in the significance levels. Is anything
wrong? Should I change the the SE when I am working with aweights?
These are the results of both regressions:


Using firm level data and aweights
(sum of wgt is   6.1000e+01)

      Source |       SS               df          MS
    Number of obs =   44244
-------------+--------------------------------------------------------------
       F(  1, 44242) =  118.27
       Model |      26.1492442     1           26.1492442
Prob > F      =  0.0000
    Residual |     9781.57782    44242     .221092578
R-squared     =  0.0027
-------------+--------------------------------------------------------------
       Adj R-squared =  0.0026
       Total |  9807.72706          44243     .221678617          Root
MSE      =   .4702

--------------------------------------------------------------------------------------------------------------------------
                y  |      Coef.       Std. Err.      t        P>|t|
 [95% Conf. Interval]
-------------+-----------------------------------------------------------------------------------------------------------
                x  |   .1231604   .0113248    10.88   0.000
.1009637    .1453572
         _cons  |   .5841591   .0080537    72.53   0.000     .5683737
  .5999445
---------------------------------------------------------------------------------------------------------------------------

Using country averages and no weights

      Source  |       SS           df       MS
Number of obs =      61
-------------+--------------------------------------------------
    F(  1,    59) =    0.88
       Model  |  .036052432     1    .036052432           Prob > F
 =  0.3524
    Residual |  2.42089584    59    .041032133           R-squared     =  0.0147
-------------+--------------------------------------------------
    Adj R-squared = -0.0020
         Total |  2.45694827    60  .040949138               Root MSE
    =  .20256

---------------------------------------------------------------------------------------------------------------
                y |      Coef.   Std. Err.      t    P>|t|     [95%
Conf. Interval]
-------------+-----------------------------------------------------------------------------------------------
                x |   .1231604   .1313911     0.94   0.352
-.1397526    .3860734
         _cons |   .5841591   .0934402     6.25   0.000     .3971856    .7711326
-------------------------------------------------------------------------------------------------------------------

Diego
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: spaghetti plot error
Next by Date: RE: st: Binning question
Previous by thread: Re: st: spaghetti plot error
Next by thread: Re: st: problems with finding mata libraries in the PLUS directory
Index(es):
- Date
- Thread