# Re: st: ramdom sampling more than _n or_N

 From jpitblado@stata.com (Jeff Pitblado, Stata Corp.) To statalist@hsphsun2.harvard.edu Subject Re: st: ramdom sampling more than _n or_N Date Thu, 12 Jun 2003 10:17:44 -0500

```P.Chakkrit <ed034001@srv.cc.hit-u.ac.jp> asks how to generate a random sample,
with replacement, of observations in a dataset:

> I would like to ask that how can we random sampling our dataset more than _n
> or _N. I think we have to write a program to repeat more times instead. I
> never write any program before, I also have tried it but it did not work
> out(it only sampling 1 data and finish) so I would be very appreciate if you
> could help me. Below is my program;

> program define sam1 /* e region Nk: to sampling e(1-841) by region(1-19)
> no.Nk in each region (more than no. of e in each region) */
> local t=1
> while `t'<=19{
>               local i=1
>               while `i'<=`3'{
>                              keep if region==`t'
>                              sample 1, count
> 			 local i=`i'+1
>                             }
>               local t =`t'+1
>              }
> end

The Stata command to perform sampling of observations with replacement is
-bsample-.  With the appropriate options, -bsample- can also perform
stratified, cluster, and stratified-cluster sampling with replacement.  (The
stratified sampling features were added in Stata 8).

Based on P. Chakkrit's question, and code, I think we are talking about
stratified sampling.  In this case, if we were sampling up to but not more
than the number of observations within -region- (the strata variable), then I
would suggest one of the following:

To sample as many observations as there are in each -region-:

. bsample , strata(region)

To sample 10 observations within each -region-:

. bsample 10, strata(region)

To sample roughly half the observations within each region:

. bysort region: gen half = int(_N/2)
. bsample half, strata(region)

Unfortunately for P. Chakkrit, the algorithm implemented in -bsample- requires
that the sample size(s) be less than or equal to the number of observations
(within strata).

A way around this is to first expand the data, then use -bsample-.  How much
you expand the data depends upon your situation.  Here are a few examples:

To sample twice as many observations, per stratum, as there are in the data:

. expand 2
. bsample , strata(region)

To sample an extra 10 observations from within each strata, and assuming there
are more than 10 observations within each stratum:

. bysort region: gen Nplus10 = _N+10
. expand 2
. bsample _N+10, strata(region)

To sample 100 observations, per stratum, when there are 30 observations within
the smallest stratum (expand by 4, since there will be at least 4*30=120
observations in each stratum):

. expand 4
. bsample 100, strata(region)

--Jeff