Jeff Pitblado, Stata Corp.

Re: st: ramdom sampling more than _n or_N

Thu, 12 Jun 2003 10:17:44 -0500

P.Chakkrit <ed034001@srv.cc.hit-u.ac.jp> asks how to generate a random sample, with replacement, of observations in a dataset: > I would like to ask that how can we random sampling our dataset more than _n > or _N. I think we have to write a program to repeat more times instead. I > never write any program before, I also have tried it but it did not work > out(it only sampling 1 data and finish) so I would be very appreciate if you > could help me. Below is my program; > program define sam1 /* e region Nk: to sampling e(1-841) by region(1-19) > no.Nk in each region (more than no. of e in each region) */ > local t=1 > while `t'<=19{ > local i=1 > while `i'<=`3'{ > keep if region==`t' > sample 1, count > local i=`i'+1 > } > local t =`t'+1 > } > end The Stata command to perform sampling of observations with replacement is -bsample-. With the appropriate options, -bsample- can also perform stratified, cluster, and stratified-cluster sampling with replacement. (The stratified sampling features were added in Stata 8). Based on P. Chakkrit's question, and code, I think we are talking about stratified sampling. In this case, if we were sampling up to but not more than the number of observations within -region- (the strata variable), then I would suggest one of the following: To sample as many observations as there are in each -region-: . bsample , strata(region) To sample 10 observations within each -region-: . bsample 10, strata(region) To sample roughly half the observations within each region: . bysort region: gen half = int(_N/2) . bsample half, strata(region) Unfortunately for P. Chakkrit, the algorithm implemented in -bsample- requires that the sample size(s) be less than or equal to the number of observations (within strata). A way around this is to first expand the data, then use -bsample-. How much you expand the data depends upon your situation. Here are a few examples: To sample twice as many observations, per stratum, as there are in the data: . expand 2 . bsample , strata(region) To sample an extra 10 observations from within each strata, and assuming there are more than 10 observations within each stratum: . bysort region: gen Nplus10 = _N+10 . expand 2 . bsample _N+10, strata(region) To sample 100 observations, per stratum, when there are 30 observations within the smallest stratum (expand by 4, since there will be at least 4*30=120 observations in each stratum): . expand 4 . bsample 100, strata(region) --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

