[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: RE: Taking random samples from data

From   "Song" <>
To   <>
Subject   Re: st: RE: RE: Taking random samples from data
Date   Thu, 31 Jul 2008 22:45:14 -0500

The final stem seems to be dropping observations if samp == 0. Thank you. It helps a lot.


----- Original Message ----- From: "Maarten buis" <>
To: <>
Sent: Thursday, July 31, 2008 11:24 AM
Subject: Re: st: RE: RE: Taking random samples from data

Below is an example of how you can draw clusters (in this case
represented by the variable rep78) without replacement. You have asked
for without replacement so I have given it to you, but in many cases
with replacement is more appropriate.

-- Maarten

*----------------- begin example ----------------
sysuse auto, clear

// create a reduced dataset
keep rep78
bys rep78: keep if _n == 1 & rep78 < .

// draw 3 observations without replacement
gen u = uniform()
sort u
gen byte samp = _n <= 3
drop u
sort rep78
tempfile a
save `a'

// merge with complete dataset
sysuse auto, clear
sort rep78
merge rep78 using `a'
*-------------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see )

--- Nick Cox <> wrote:

Alternatively, sample from a reduced dataset with one observation per
and then


Peter Adamson

You could try -reshape- on your data first.  Then bsample.


I have a question about taking random samples from my data. My
around 12,500 user ID's with 200,000 observations total and I want to
around 500-600 (number of users) random samples. The problem is that
member has multiple observations and I want to take all
each member. Each ID has 4 to 21 observations. For example, if ID
has 10 observations, I want to take all 10 observations given ID
5 is
included in the sample.

I tried the following and ended up with 580 number of users with
8,800 observations. This method works, but I wonder if there is there
better way for this job, because I have to drop duplicated samples

gen idcnt=_N
bsample 600, cluster(id)     /* sampling with replacement: I do not
to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1                /* To drop duplicated samples */

*   For searches and help try:

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo!
* For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index