Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Taking random samples from data


From   "Song" <[email protected]>
To   <[email protected]>
Subject   Re: st: RE: RE: Taking random samples from data
Date   Thu, 31 Jul 2008 22:45:14 -0500

The final stem seems to be dropping observations if samp == 0. Thank you. It helps a lot.

Reo.

----- Original Message ----- From: "Maarten buis" <[email protected]>
To: <[email protected]>
Sent: Thursday, July 31, 2008 11:24 AM
Subject: Re: st: RE: RE: Taking random samples from data



Below is an example of how you can draw clusters (in this case
represented by the variable rep78) without replacement. You have asked
for without replacement so I have given it to you, but in many cases
with replacement is more appropriate.

-- Maarten

*----------------- begin example ----------------
sysuse auto, clear

// create a reduced dataset
keep rep78
bys rep78: keep if _n == 1 & rep78 < .

// draw 3 observations without replacement
gen u = uniform()
sort u
gen byte samp = _n <= 3
drop u
sort rep78
tempfile a
save `a'

// merge with complete dataset
sysuse auto, clear
sort rep78
merge rep78 using `a'
*-------------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

--- Nick Cox <[email protected]> wrote:

Alternatively, sample from a reduced dataset with one observation per
ID
and then
-merge-.

Nick
[email protected]

Peter Adamson

You could try -reshape- on your data first.  Then bsample.

Song

I have a question about taking random samples from my data. My
dataset
has
around 12,500 user ID's with 200,000 observations total and I want to
take
around 500-600 (number of users) random samples. The problem is that
each
member has multiple observations and I want to take all
sub-observations
for
each member. Each ID has 4 to 21 observations. For example, if ID
number
5
has 10 observations, I want to take all 10 observations given ID
number
5 is
included in the sample.

I tried the following and ended up with 580 number of users with
around
8,800 observations. This method works, but I wonder if there is there
any
better way for this job, because I have to drop duplicated samples
with
this
method.

gen idcnt=_N
bsample 600, cluster(id)     /* sampling with replacement: I do not
know
how
to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1                /* To drop duplicated samples */


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index