[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
HJW <hjw48823@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: resampling with weights |

Date |
Mon, 9 Jun 2003 20:02:14 -0700 (PDT) |

Dear listers, I have a survey dataset in which every observation (ie., every survey participant) has a sampling weight corresponding to the number of represented subjects in the population. My question is, whether and how the sampling weight information should be used in the resampling process of a bootstrap exercise. Because the sampling weight = 1/(selection probability), obviously the survey participants are not selected with equal probabilities. So I think something must be done differently in the resampling process. This is what I have tried: (1) Expanding the dataset by the weight variable (wgt) via -expand wgt-, so that an observation with, say, wgt=2 is replicated twice in the expanded dataset. Now every observation should be presented with equal probability in the expanded dataset. (2) Do random draws with replacements to select N observations (N: population size) from the expanded dataset. After done, reduce the size of the sampled dataset via -contract _all, freq(wgt2)-, which also creates a new weight variable wgt2. (3) Estimate my multinomial logit model with the pweight=wgt2 option. (Using fweight=wgt3 produces the same coefficients.) The results, however, are very tight confidence intervals for almost all of the bootstrapped statistics. They are too good to be true. On the other hand, if I used the more "naive" approach of not expanding and contracting the original data before and after the resampling (but still having the weight option of pweight=wgt in the estimation), I obtained much more reasonable (both economically and statistically) results. Why is this so? What is the appropriate way to handle sampling weight in a bootstrapping exercise? The only thing I can think of is that the above -expand- and -contract- strategy amounts to resample the "population", rather than a represented sample of the population. Could this be the reason? Any insight will be greatly appreciated! HJW __________________________________ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Cluster option in Regress command** - Next by Date:
**Re: st: Re: Using Stata 8 data files in Stata 6** - Previous by thread:
**st: Using Stata 8 data files in Stata 6** - Next by thread:
**st: percentiles** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |