st: RE: RE: Taking random samples from data

Subject   st: RE: RE: Taking random samples from data
Date   Thu, 31 Jul 2008 17:03:29 +0100

Alternatively, sample from a reduced dataset with one observation per ID
and then 


Peter Adamson

You could try -reshape- on your data first.  Then bsample.


I have a question about taking random samples from my data. My dataset
around 12,500 user ID's with 200,000 observations total and I want to
around 500-600 (number of users) random samples. The problem is that
member has multiple observations and I want to take all sub-observations
each member. Each ID has 4 to 21 observations. For example, if ID number
has 10 observations, I want to take all 10 observations given ID number
5 is 
included in the sample.

I tried the following and ended up with 580 number of users with around 
8,800 observations. This method works, but I wonder if there is there
better way for this job, because I have to drop duplicated samples with

gen idcnt=_N
bsample 600, cluster(id)     /* sampling with replacement: I do not know
to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1                /* To drop duplicated samples */

