Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Taking random samples from data


From   "Song" <raravise@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Taking random samples from data
Date   Thu, 31 Jul 2008 10:38:17 -0500

Hi All,

I have a question about taking random samples from my data. My dataset has around 12,500 user ID's with 200,000 observations total and I want to take around 500-600 (number of users) random samples. The problem is that each member has multiple observations and I want to take all sub-observations for each member. Each ID has 4 to 21 observations. For example, if ID number 5 has 10 observations, I want to take all 10 observations given ID number 5 is included in the sample.

I tried the following and ended up with 580 number of users with around 8,800 observations. This method works, but I wonder if there is there any better way for this job, because I have to drop duplicated samples with this method.

gen idcnt=_N
bsample 600, cluster(id) /* sampling with replacement: I do not know how to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1 /* To drop duplicated samples */

I would greatly appreciate your help.
Reo.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index