Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: re: Sample with weights


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: re: Sample with weights
Date   Mon, 3 Oct 2005 20:06:39 +0100

This implements a rather pedestrian look-up 
technique. Proofs of incorrectness are solicited. 

*! NJC 1.0.0 3 Oct 2005 
* sampleproptosize #, size(size_variable) generate(in_sample) 
program sampleproptosize, sort    
	version 8        
	gettoken n 0 : 0, parse(" ,")   
	confirm integer num `n' 
	if `n' <= 0 { 
		di as err "`n' must be positive" 
		exit 198 
	}
	
	syntax [if] [in] , size(varname) Generate(str) 
	marksample touse 
	markout `touse' `size'
	
	su `size' if `touse', meanonly 
	if r(min) < 0 { 
		di as err "negative values in `size'" 
		exit 411 
	}

	if r(N) < `n' { 
		di as err ///
		"sample `n' requested, but only " r(N) " observations"
		exit 198 
	}	

	quietly { 
		tempvar target id 
		tempname rnd 
		replace `touse' = -`touse' 
		bysort `touse': gen `target' = sum(`size' / r(sum))  
		local g "`generate'" 
	        gen byte `g' = 0 
		gen long `id' = _n 
		count if `g' 
		
		while r(N) < `n' { 
			scalar `rnd' = uniform()  
			su `id' if `touse' ///
				& inrange(`rnd',`target'[_n-1],`target') ///
				& !`g', meanonly 
			if r(max) < . replace `g' = 1 in `r(max)' 
			count if `g' 
		}
	} 	
end 


Nick 
n.j.cox@durham.ac.uk 

Willard van Ooij
 
> Hm, interesting!
> 
> This is indeed not the kind of selection I wanted. I went for your
> suggestion, Nick. Which was (for a sample size of 100):
> 
> (1) Calculate for each company the probability of inclusion.  This is 
> (sample size) * (size of company / total of company sizes).  So 
> assuming a sample size of 100:
> 
>    . sum size
>    . gen prob = 100 * ( size / r(sum) )
> 
> (2) Then select the sample based on these probabilities
> 
>    . gen u = uniform()
> 
>    . gen insamp = u < prob
> 
> Since the sample size didn't have to be that precise, but had to be
> substantially lower than 100, it sufficed for me to tweak a 
> little with
> the number 100 untill I had about the right sample size. I remain
> interested in a solution which leads to a precise sample size.
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index