Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: ramdom sampling more than _n or_N

From (Jeff Pitblado, Stata Corp.)
Subject   Re: st: ramdom sampling more than _n or_N
Date   Thu, 12 Jun 2003 10:17:44 -0500

P.Chakkrit <> asks how to generate a random sample,
with replacement, of observations in a dataset:

> I would like to ask that how can we random sampling our dataset more than _n
> or _N. I think we have to write a program to repeat more times instead. I
> never write any program before, I also have tried it but it did not work
> out(it only sampling 1 data and finish) so I would be very appreciate if you
> could help me. Below is my program;

> program define sam1 /* e region Nk: to sampling e(1-841) by region(1-19)
> no.Nk in each region (more than no. of e in each region) */
> local t=1
> while `t'<=19{
>               local i=1
>               while `i'<=`3'{				     
>                              keep if region==`t'
>                              sample 1, count
> 			 local i=`i'+1
>                             }
>               local t =`t'+1
>              }       
> end

The Stata command to perform sampling of observations with replacement is
-bsample-.  With the appropriate options, -bsample- can also perform
stratified, cluster, and stratified-cluster sampling with replacement.  (The
stratified sampling features were added in Stata 8).

Based on P. Chakkrit's question, and code, I think we are talking about
stratified sampling.  In this case, if we were sampling up to but not more
than the number of observations within -region- (the strata variable), then I
would suggest one of the following:

To sample as many observations as there are in each -region-:

	. bsample , strata(region)

To sample 10 observations within each -region-:

	. bsample 10, strata(region)

To sample roughly half the observations within each region:

	. bysort region: gen half = int(_N/2)
	. bsample half, strata(region)

Unfortunately for P. Chakkrit, the algorithm implemented in -bsample- requires
that the sample size(s) be less than or equal to the number of observations
(within strata).

A way around this is to first expand the data, then use -bsample-.  How much
you expand the data depends upon your situation.  Here are a few examples:

To sample twice as many observations, per stratum, as there are in the data:

	. expand 2
	. bsample , strata(region)

To sample an extra 10 observations from within each strata, and assuming there
are more than 10 observations within each stratum:

	. bysort region: gen Nplus10 = _N+10
	. expand 2
	. bsample _N+10, strata(region)

To sample 100 observations, per stratum, when there are 30 observations within
the smallest stratum (expand by 4, since there will be at least 4*30=120
observations in each stratum):

	. expand 4
	. bsample 100, strata(region)

*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index