[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Willard van Ooij" <w.van.ooij@marktmonitor.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: re: Sample with weights |

Date |
Tue, 4 Oct 2005 10:35:11 +0200 |

Wonderful! As far as I can see, this seems to do the job. I'll sure give it a try. Thanks a lot, Nick! Willard -----Oorspronkelijk bericht----- Van: Nick Cox [mailto:n.j.cox@durham.ac.uk] Verzonden: maandag 3 oktober 2005 21:07 Aan: statalist@hsphsun2.harvard.edu Onderwerp: RE: st: re: Sample with weights This implements a rather pedestrian look-up technique. Proofs of incorrectness are solicited. *! NJC 1.0.0 3 Oct 2005 * sampleproptosize #, size(size_variable) generate(in_sample) program sampleproptosize, sort version 8 gettoken n 0 : 0, parse(" ,") confirm integer num `n' if `n' <= 0 { di as err "`n' must be positive" exit 198 } syntax [if] [in] , size(varname) Generate(str) marksample touse markout `touse' `size' su `size' if `touse', meanonly if r(min) < 0 { di as err "negative values in `size'" exit 411 } if r(N) < `n' { di as err /// "sample `n' requested, but only " r(N) " observations" exit 198 } quietly { tempvar target id tempname rnd replace `touse' = -`touse' bysort `touse': gen `target' = sum(`size' / r(sum)) local g "`generate'" gen byte `g' = 0 gen long `id' = _n count if `g' while r(N) < `n' { scalar `rnd' = uniform() su `id' if `touse' /// & inrange(`rnd',`target'[_n-1],`target') /// & !`g', meanonly if r(max) < . replace `g' = 1 in `r(max)' count if `g' } } end Nick n.j.cox@durham.ac.uk Willard van Ooij > Hm, interesting! > > This is indeed not the kind of selection I wanted. I went for your > suggestion, Nick. Which was (for a sample size of 100): > > (1) Calculate for each company the probability of inclusion. This is > (sample size) * (size of company / total of company sizes). So > assuming a sample size of 100: > > . sum size > . gen prob = 100 * ( size / r(sum) ) > > (2) Then select the sample based on these probabilities > > . gen u = uniform() > > . gen insamp = u < prob > > Since the sample size didn't have to be that precise, but had to be > substantially lower than 100, it sufficed for me to tweak a little > with the number 100 untill I had about the right sample size. I remain > interested in a solution which leads to a precise sample size. > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Re: Please help: command "pctile quant = yhat3, n('groups')" doesn't work!** - Next by Date:
**RE: st: Re: Please help: command "pctile quant = yhat3, n('groups')" doesn't work!** - Previous by thread:
**RE: st: re: Sample with weights** - Next by thread:
**st: RE: Adding missing observations explicitly to a data file** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |