There is a bunch of different implementations of probability
proportional to size (PPS) sampling floating around -- Stephen
Jenkins' -samplepps- on SSC and my -ppssample-. From Mata, you can
call them with -stata()- command. Or you can try to dig in and write
your own implementation in Mata, although there are certain subtle
issues that are easy to overlook (and I think I might have done that
three or so ago when I wrote my code, as I'm only figuring this stuff
out reasonably well right now as I teach the complex data analysis
class :)).
You can still go with -expand-, but then you would have to discard the
samples that have the repeat observations, and repeat anew. That is a
valid algorithm of PPS sampling, but it requires all selection
probabilities to have a reasonably small common denominator.
I've no idea whatsoever what the properties of any matching procedure
might be with non-unit weights. If you are doing the bootstrap with
matching, then I would have a lot of extra doubts on top of the
regular ones regarding the bootstrap and matching separately, as
matching is a non-smooth procedure (unless you are using nice quartic
weights, or something of a kind), and resampling estimators generally
work well with smooth functionals of the data, but may need some extra
effort in non-smooth problems like say distribution quantiles.
On 2/15/07, Ellen Van de Poel <vandepoel@few.eur.nl> wrote:
Dear all,
I want to draw a random subsample from my data, but taking into account my
sample weights.
I thought of inflating my data (with the command "expand") to get rid of the
weights and then draw a random subsample from the expanded data. But I'm not
sure whether this is correct, since then I can have the same observation
multiple times in the random subsample?
The extra difficulty is that I am working with matrix language. The code
where the random subsample is drawn looks something like this:
p0i = p0[sort(unorder(rows(p0))[|1 \ n|],1)]
So I take the first n observations, that are drawn randomly from p0, and
then sort them again.
Thereafter I match this random subsample with another sample (I'm doing a
Fairlie decomposition). So I guess it is necessary to account for the sample
weights when I match the random subsample with another sample?
Could anyone help me with this?
Thanks a lot in advance,
Ellen Van de Poel
--
Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/