Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Diego Anzoategui <diegoanzoa@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Question on aweights |

Date |
Mon, 1 Aug 2011 12:53:46 -0300 |

Dear all, I have a question regarding the aweights. I am using a dataset that has data at firm level for a number of countries. I would like to run some regressions using firm-level data but assigning more weight to those firms that come from countries where fewer firms were surveyed so that all the countries are equally represented in the regression. In my sample I have African countries with only 70 firms surveyed and countries like China with over 1500 firms surveyed. I do not want the results to represent only those countries with a large number of observations. I found that aweights could be useful for this. More specifically, I can use the inverse of the number of firms surveyed in each country as aweights and the results are going to be similar to those obtained taking the average by country and then running the regression without aweights. That is, doing: reg y x [aweight=1/nr_firms] shows the same coefficients as doing this: collapse (mean) y x nr_firms, by(country) and then, reg y x where: y is the dependent variable, x is the independent variable, nr_firms is the number of firms surveyed in each country, country is a string with the name of each country. I have run these two regressions and the coefficients are exactly the same, but the Standard Errors differ. In fact, in the regression using the firm-level data and aweights the variable x is significant at 1 percent, while in the regression of the country averages the same variable is not significant. Obviously, the difference is in the Mean of Square Errors. Why are they that different? I understand that they can be different, but why is there such a big difference in the significance levels. Is anything wrong? Should I change the the SE when I am working with aweights? These are the results of both regressions: Using firm level data and aweights (sum of wgt is 6.1000e+01) Source | SS df MS Number of obs = 44244 -------------+-------------------------------------------------------------- F( 1, 44242) = 118.27 Model | 26.1492442 1 26.1492442 Prob > F = 0.0000 Residual | 9781.57782 44242 .221092578 R-squared = 0.0027 -------------+-------------------------------------------------------------- Adj R-squared = 0.0026 Total | 9807.72706 44243 .221678617 Root MSE = .4702 -------------------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+----------------------------------------------------------------------------------------------------------- x | .1231604 .0113248 10.88 0.000 .1009637 .1453572 _cons | .5841591 .0080537 72.53 0.000 .5683737 .5999445 --------------------------------------------------------------------------------------------------------------------------- Using country averages and no weights Source | SS df MS Number of obs = 61 -------------+-------------------------------------------------- F( 1, 59) = 0.88 Model | .036052432 1 .036052432 Prob > F = 0.3524 Residual | 2.42089584 59 .041032133 R-squared = 0.0147 -------------+-------------------------------------------------- Adj R-squared = -0.0020 Total | 2.45694827 60 .040949138 Root MSE = .20256 --------------------------------------------------------------------------------------------------------------- y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+----------------------------------------------------------------------------------------------- x | .1231604 .1313911 0.94 0.352 -.1397526 .3860734 _cons | .5841591 .0934402 6.25 0.000 .3971856 .7711326 ------------------------------------------------------------------------------------------------------------------- Diego * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: spaghetti plot error** - Next by Date:
**RE: st: Binning question** - Previous by thread:
**Re: st: spaghetti plot error** - Next by thread:
**Re: st: problems with finding mata libraries in the PLUS directory** - Index(es):