Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Taking random samples from data


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: Taking random samples from data
Date   Thu, 31 Jul 2008 17:24:21 +0100 (BST)

Below is an example of how you can draw clusters (in this case
represented by the variable rep78) without replacement. You have asked
for without replacement so I have given it to you, but in many cases
with replacement is more appropriate.

-- Maarten

*----------------- begin example ----------------
sysuse auto, clear

// create a reduced dataset
keep rep78
bys rep78: keep if _n == 1 & rep78 < .

// draw 3 observations without replacement
gen u = uniform()
sort u
gen byte samp = _n <= 3
drop u
sort rep78
tempfile a
save `a'

// merge with complete dataset
sysuse auto, clear
sort rep78
merge rep78 using `a'
*-------------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

--- Nick Cox <n.j.cox@durham.ac.uk> wrote:

> Alternatively, sample from a reduced dataset with one observation per
> ID
> and then 
> -merge-. 
> 
> Nick
> n.j.cox@durham.ac.uk 
> 
> Peter Adamson
> 
> You could try -reshape- on your data first.  Then bsample.
> 
> Song
> 
> I have a question about taking random samples from my data. My
> dataset
> has 
> around 12,500 user ID's with 200,000 observations total and I want to
> take 
> around 500-600 (number of users) random samples. The problem is that
> each 
> member has multiple observations and I want to take all
> sub-observations
> for 
> each member. Each ID has 4 to 21 observations. For example, if ID
> number
> 5 
> has 10 observations, I want to take all 10 observations given ID
> number
> 5 is 
> included in the sample.
> 
> I tried the following and ended up with 580 number of users with
> around 
> 8,800 observations. This method works, but I wonder if there is there
> any 
> better way for this job, because I have to drop duplicated samples
> with
> this 
> method.
> 
> gen idcnt=_N
> bsample 600, cluster(id)     /* sampling with replacement: I do not
> know
> how 
> to take cluster samples without replacement. */
> bysort id: egen idcount=count(id)
> compare idcount idcnt
> duplicates tag, gen(dup)
> drop if dup==1                /* To drop duplicated samples */
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index