# RE: st: re: Sample with weights

 From Nick Winter To statalist@hsphsun2.harvard.edu Subject RE: st: re: Sample with weights Date Mon, 03 Oct 2005 10:26:19 -0400

All I can think of is some sort of iterated thing that plays with the requested sample size, which could be automated.

--Nick

At 10:18 AM 10/3/2005, you wrote:

Hm, interesting!

This is indeed not the kind of selection I wanted. I went for your
suggestion, Nick. Which was (for a sample size of 100):

(1) Calculate for each company the probability of inclusion.  This is
(sample size) * (size of company / total of company sizes).  So
assuming a sample size of 100:

. sum size
. gen prob = 100 * ( size / r(sum) )

(2) Then select the sample based on these probabilities

. gen u = uniform()

. gen insamp = u < prob

Since the sample size didn't have to be that precise, but had to be
substantially lower than 100, it sufficed for me to tweak a little with
the number 100 untill I had about the right sample size. I remain
interested in a solution which leads to a precise sample size.

Thanks for the very helpful suggestions thus far.

Willard

-----Oorspronkelijk bericht-----
Van: Nick Winter [mailto:nw53@cornell.edu]
Verzonden: maandag 3 oktober 2005 15:54
Aan: statalist@hsphsun2.harvard.edu
Onderwerp: Re: st: re: Sample with weights

Doesn't work.

First, you need to sort the other direction.

But more seriously, this does not generate selection probabilities
proportional to size.  Consider this code, which creates fake data,
then draws 500 samples of 200 using this methed.  The graph at the
end makes clear that the selection probabilities are not proportional to
size:

clear
set obs 1000
gen firm = _n
set seed 12345678
gen size = int(uniform()*100) + 1
gen sampled = 0

forval i=1/500 {
gen ppsorder = uniform() * size
gsort -ppsorder
qui replace sampled = sampled+1 if _n <= 200
drop ppsorder
}

graph twoway scatter sampled size

--Nick WInter

At 11:04 AM 10/1/2005, you wrote:
>This is simple, produces a sample of exactly the desired size, and I
>believe fulfills the condition of the probability of selection being
>proportional to size . *Assume "Size" is the company size variable, and

>M is the desired sample size gen ppsorder = uniform() * Size
>sort ppsorder
>keep if _n <= M
>drop ppsorder
>
>Yes, sorting the file is a bit clumsy, but this is presumably a one
>time thing,
>not something appearing inside a loop.
>
>Regards,
>
>
>=-=-=-=-=-=-=-=-=-=-=-=-=
>Mike Lacy
>Fort Collins CO USA
>(970) 491-6721 office

________________________________________________________
Nicholas J. G. Winter                     607.255.8819 t
Assistant Professor                       607.255.4530 f
Department of Government              nw53@cornell.edu e
Cornell University        falcon.arts.cornell.edu/nw53 w
308 White Hall
Ithaca, NY 14853-4601

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
________________________________________________________
Nicholas J. G. Winter 607.255.8819 t
Assistant Professor 607.255.4530 f
Department of Government nw53@cornell.edu e
Cornell University falcon.arts.cornell.edu/nw53 w
308 White Hall
Ithaca, NY 14853-4601

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/