# Re: st: random subsample - sample weights

 From "Stas Kolenikov" To statalist@hsphsun2.harvard.edu Subject Re: st: random subsample - sample weights Date Thu, 15 Feb 2007 10:54:12 -0600

```There is a bunch of different implementations of probability
proportional to size (PPS) sampling floating around -- Stephen
Jenkins' -samplepps- on SSC and my -ppssample-. From Mata, you can
call them with -stata()- command. Or you can try to dig in and write
your own implementation in Mata, although there are certain subtle
issues that are easy to overlook (and I think I might have done that
three or so ago when I wrote my code, as I'm only figuring this stuff
out reasonably well right now as I teach the complex data analysis
class :)).

You can still go with -expand-, but then you would have to discard the
samples that have the repeat observations, and repeat anew. That is a
valid algorithm of PPS sampling, but it requires all selection
probabilities to have a reasonably small common denominator.

I've no idea whatsoever what the properties of any matching procedure
might be with non-unit weights. If you are doing the bootstrap with
matching, then I would have a lot of extra doubts on top of the
regular ones regarding the bootstrap and matching separately, as
matching is a non-smooth procedure (unless you are using nice quartic
weights, or something of a kind), and resampling estimators generally
work well with smooth functionals of the data, but may need some extra
effort in non-smooth problems like say distribution quantiles.

On 2/15/07, Ellen Van de Poel <vandepoel@few.eur.nl> wrote:
```
```Dear all,

I want to draw a random subsample from my data, but taking into account my
sample weights.
I thought of inflating my data (with the command "expand") to get rid of the
weights and then draw a random subsample from the expanded data. But I'm not
sure whether this is correct, since then I can have the same observation
multiple times in the random subsample?

The extra difficulty is that I am working with matrix language. The code
where the random subsample is drawn looks something like this:
p0i = p0[sort(unorder(rows(p0))[|1 \ n|],1)]

So I take the first n observations, that are drawn randomly from p0, and
then sort them again.

Thereafter I match this random subsample with another sample (I'm doing a
Fairlie decomposition). So I guess it is necessary to account for the sample
weights when I match the random subsample with another sample?

Could anyone help me with this?

Ellen Van de Poel

```
```
--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```