Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: resampling with weights


From   HJW <hjw48823@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: resampling with weights
Date   Mon, 9 Jun 2003 20:02:14 -0700 (PDT)

Dear listers,

I have a survey dataset in which every observation (ie., every
survey participant) has a sampling weight corresponding to the
number of represented subjects in the population. My question is,
whether and how the sampling weight information should be used in
the resampling process of a bootstrap exercise.

Because the sampling weight = 1/(selection probability), obviously
the survey participants are not selected with equal probabilities.
So I think something must be done differently in the resampling
process.

This is what I have tried:
 (1) Expanding the dataset by the weight variable (wgt) via
  -expand wgt-, so that an observation with, say, wgt=2 is replicated
  twice in the expanded dataset. Now every observation should be
  presented with equal probability in the expanded dataset.
 (2) Do random draws with replacements to select N observations
  (N: population size) from the expanded dataset. After done, reduce
  the size of the sampled dataset via -contract _all, freq(wgt2)-,
  which also creates a new weight variable wgt2.
 (3) Estimate my multinomial logit model with the pweight=wgt2 option.
    (Using fweight=wgt3 produces the same coefficients.)

The results, however, are very tight confidence intervals for almost
all of the bootstrapped statistics. They are too good to be true. On
the other hand, if I used the more "naive" approach of not expanding
and contracting the original data before and after the resampling
(but still having the weight option of pweight=wgt in the
estimation), I obtained much more reasonable (both economically and
statistically) results.

Why is this so? What is the appropriate way to handle sampling
weight in a bootstrapping exercise? The only thing I can think of is
that the above -expand- and -contract- strategy amounts to resample
the "population", rather than a represented sample of the
population. Could this be the reason? Any insight will be greatly
appreciated!

HJW

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index