st: Questions on weights on regressions

 From Tak-wai Chau To statalist@hsphsun2.harvard.edu Subject st: Questions on weights on regressions Date Sun, 13 Nov 2005 15:58:05 -0500

Hi, statalist users,

I have two questions about the use of weights in regression.

First, I have a question about using aweight in regression. As I understand from this:
http://www.stata.com/support/faqs/stat/crc36.html
The (slope) coefficients and se estimates of regression using aweights (=n) and those of regression with variables transformed by multiplying sqrt(n) are the same. However, what I have got is not the same. Maybe I have misunderstood the above page. If so, how and why?

I have attached the codes and results at the end of the mail.

My second question is as follows: I am working with the US census data and I am pooling data from 1960 to 2000. Due to the huge data size and following some other researchers working on it, I am going to use group mean data to run certain regressions, with aweight=cellsize (number of original observations it is averaged from.)

First, I should expect a loss of efficiency, am I correct?

Second, a problem is that individual observations contains a person weight due to survey design, especially after 1990. One suggestion is to use this person weight (as pweight) to calculate the cell means and use aweight=cellsize to do the regression on cell means, where cellsize is the number observation these means are derived from, without regarding the person weight.

I would like to ask if it is a good way, and if there is another better way to deal with this situation, say should we take into account the person weight to construct the weight in the regression stage?

Thank you very much for your assistance and opinion!

Regards,
Tak Wai

My codes used are:
. reg y x1 x2 [aw=celsize]
(sum of wgt is 3.0000e+02)
Number of obs = 20
(something omitted...)
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 2.130015 .5890904 3.62 0.002 .8871432 3.372887
x2 | 1.364704 .7330079 1.86 0.080 -.181808 2.911215
_cons | 1.127888 .1742532 6.47 0.000 .7602464 1.495531
------------------------------------------------------------------------------

. gen yw=sqrt(celsize)*y

. gen x1w=sqrt(celsize)*x1

. gen x2w=sqrt(celsize)*x2

.
. reg yw x1w x2w
Number of obs = 20
[also something omitted...]
------------------------------------------------------------------------------
yw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1w | 2.024784 .5655494 3.58 0.002 .831579 3.217989
x2w | 1.476072 .7049143 2.09 0.052 -.0111669 2.963311
_cons | 4.436417 .6484838 6.84 0.000 3.068236 5.804598
------------------------------------------------------------------------------

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/