[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Sayer, Bryan" <BSayer@s-3.com> |

To |
"'HJW '" <hjw48823@yahoo.com>, "'statalist@hsphsun2.harvard.edu '" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: resampling with weights |

Date |
Tue, 10 Jun 2003 09:10:17 -0400 |

Expanding a dataset by weight is only appropriate with frequency weights, when we KNOW we have (weight) observations with the associated characteristics. With probability weights, we only THINK we have (weight) observations with the characteristics, but we are not certain. So you certainly should not do that. bootstrapping with survey data is not very well developed, but I believe J.N.K. Rao has one or two papers on the topic. Bryan Sayer Statistician, SSS Inc. -----Original Message----- From: HJW To: statalist@hsphsun2.harvard.edu Sent: 6/9/03 11:02 PM Subject: st: resampling with weights Dear listers, I have a survey dataset in which every observation (ie., every survey participant) has a sampling weight corresponding to the number of represented subjects in the population. My question is, whether and how the sampling weight information should be used in the resampling process of a bootstrap exercise. Because the sampling weight = 1/(selection probability), obviously the survey participants are not selected with equal probabilities. So I think something must be done differently in the resampling process. This is what I have tried: (1) Expanding the dataset by the weight variable (wgt) via -expand wgt-, so that an observation with, say, wgt=2 is replicated twice in the expanded dataset. Now every observation should be presented with equal probability in the expanded dataset. (2) Do random draws with replacements to select N observations (N: population size) from the expanded dataset. After done, reduce the size of the sampled dataset via -contract _all, freq(wgt2)-, which also creates a new weight variable wgt2. (3) Estimate my multinomial logit model with the pweight=wgt2 option. (Using fweight=wgt3 produces the same coefficients.) The results, however, are very tight confidence intervals for almost all of the bootstrapped statistics. They are too good to be true. On the other hand, if I used the more "naive" approach of not expanding and contracting the original data before and after the resampling (but still having the weight option of pweight=wgt in the estimation), I obtained much more reasonable (both economically and statistically) results. Why is this so? What is the appropriate way to handle sampling weight in a bootstrapping exercise? The only thing I can think of is that the above -expand- and -contract- strategy amounts to resample the "population", rather than a represented sample of the population. Could this be the reason? Any insight will be greatly appreciated! HJW __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: grading** - Next by Date:
**st: RE: Matrix inversion "bug"** - Previous by thread:
**st: Re: grading** - Next by thread:
**st: RE: Matrix inversion "bug"** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |