Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: random subsample - sample weights

From   "Stas Kolenikov" <>
Subject   Re: st: random subsample - sample weights
Date   Thu, 15 Feb 2007 10:54:12 -0600

There is a bunch of different implementations of probability
proportional to size (PPS) sampling floating around -- Stephen
Jenkins' -samplepps- on SSC and my -ppssample-. From Mata, you can
call them with -stata()- command. Or you can try to dig in and write
your own implementation in Mata, although there are certain subtle
issues that are easy to overlook (and I think I might have done that
three or so ago when I wrote my code, as I'm only figuring this stuff
out reasonably well right now as I teach the complex data analysis
class :)).

You can still go with -expand-, but then you would have to discard the
samples that have the repeat observations, and repeat anew. That is a
valid algorithm of PPS sampling, but it requires all selection
probabilities to have a reasonably small common denominator.

I've no idea whatsoever what the properties of any matching procedure
might be with non-unit weights. If you are doing the bootstrap with
matching, then I would have a lot of extra doubts on top of the
regular ones regarding the bootstrap and matching separately, as
matching is a non-smooth procedure (unless you are using nice quartic
weights, or something of a kind), and resampling estimators generally
work well with smooth functionals of the data, but may need some extra
effort in non-smooth problems like say distribution quantiles.

On 2/15/07, Ellen Van de Poel <> wrote:
Dear all,

I want to draw a random subsample from my data, but taking into account my
sample weights.
I thought of inflating my data (with the command "expand") to get rid of the
weights and then draw a random subsample from the expanded data. But I'm not
sure whether this is correct, since then I can have the same observation
multiple times in the random subsample?

The extra difficulty is that I am working with matrix language. The code
where the random subsample is drawn looks something like this:
p0i = p0[sort(unorder(rows(p0))[|1 \ n|],1)]

So I take the first n observations, that are drawn randomly from p0, and
then sort them again.

Thereafter I match this random subsample with another sample (I'm doing a
Fairlie decomposition). So I guess it is necessary to account for the sample
weights when I match the random subsample with another sample?

Could anyone help me with this?

Thanks a lot in advance,
Ellen Van de Poel

Stas Kolenikov
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index