[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Tak-wai Chau <tchau@troi.cc.rochester.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Questions on weights on regressions |

Date |
Sun, 13 Nov 2005 15:58:05 -0500 |

Hi, statalist users,

I have two questions about the use of weights in regression.

First, I have a question about using aweight in regression. As I understand from this:

http://www.stata.com/support/faqs/stat/crc36.html

The (slope) coefficients and se estimates of regression using aweights (=n) and those of regression with variables transformed by multiplying sqrt(n) are the same. However, what I have got is not the same. Maybe I have misunderstood the above page. If so, how and why?

I have attached the codes and results at the end of the mail.

My second question is as follows: I am working with the US census data and I am pooling data from 1960 to 2000. Due to the huge data size and following some other researchers working on it, I am going to use group mean data to run certain regressions, with aweight=cellsize (number of original observations it is averaged from.)

First, I should expect a loss of efficiency, am I correct?

Second, a problem is that individual observations contains a person weight due to survey design, especially after 1990. One suggestion is to use this person weight (as pweight) to calculate the cell means and use aweight=cellsize to do the regression on cell means, where cellsize is the number observation these means are derived from, without regarding the person weight.

I would like to ask if it is a good way, and if there is another better way to deal with this situation, say should we take into account the person weight to construct the weight in the regression stage?

Thank you very much for your assistance and opinion!

Regards,

Tak Wai

My codes used are:

. reg y x1 x2 [aw=celsize]

(sum of wgt is 3.0000e+02)

Number of obs = 20

(something omitted...)

------------------------------------------------------------------------------

y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1 | 2.130015 .5890904 3.62 0.002 .8871432 3.372887

x2 | 1.364704 .7330079 1.86 0.080 -.181808 2.911215

_cons | 1.127888 .1742532 6.47 0.000 .7602464 1.495531

------------------------------------------------------------------------------

. gen yw=sqrt(celsize)*y

. gen x1w=sqrt(celsize)*x1

. gen x2w=sqrt(celsize)*x2

.

. reg yw x1w x2w

Number of obs = 20

[also something omitted...]

------------------------------------------------------------------------------

yw | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x1w | 2.024784 .5655494 3.58 0.002 .831579 3.217989

x2w | 1.476072 .7049143 2.09 0.052 -.0111669 2.963311

_cons | 4.436417 .6484838 6.84 0.000 3.068236 5.804598

------------------------------------------------------------------------------

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: assistance writing a loop to create a panel dataset** - Next by Date:
**Re: st: graph matrix** - Previous by thread:
**st: RE: Re: Lots of date vars, more FU qx for Eric Wruck** - Next by thread:
**st: Questions on weights on regressions** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |