Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: pweight question |

Date |
Thu, 29 Apr 2010 22:23:32 -0500 |

On Thu, Apr 29, 2010 at 7:31 PM, Randall Lewis <randallalewis@gmail.com> wrote: > Should I really think about pweights as stratified sampling (i.e., is this how people generally think about them)? Or should I think of them as 1/f(i) where f(i) is the probability of drawing that individual from the population? You seem to have described it in the former way, rather than the latter. Or does it matter? (I'm pretty sure it should matter for some estimators--like if you were trying to compute the median of an RV, x, and had stratified sampled 100 observations according to each percentile, your estimate of the median would have a much different S.E. than if you had just sampled individuals randomly, via simple random sampling. I always think of probability weights as the inverse selection probabilities, as in Horvitz-Thompson estimator. Stratified sampling is one special case that might generate differential weights, but (a) it is not necessary that stratified sampling produces differential weights (example: proportional allocation), (b) different probabilities of selection might come from other sources, typically in multistage sampling (e.g. when your measure of size needs corrections in PPS sampling), (c) the weights given in the publicly released files will have post-stratification and non-response adjustments on top of the inverse probability of selection weights. In your hypothetical example of deeply stratified sampling, the variance is not estimable if you take one unit from each stratum. Think of 100 of independent samples of size 1; you cannot estimate any variances from any of these samples. But if you took say 10 units from each decile, then yes, that would give you more accurate estimate of pretty much anything related to the distribution of that variable. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: pweight question***From:*Randall Lewis <randallalewis@gmail.com>

- Prev by Date:
**Re: st: pweight question** - Next by Date:
**Re: st: non-normal residual** - Previous by thread:
**Re: st: pweight question** - Next by thread:
**st: Confidence intervals from -margins-** - Index(es):