Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to (almost) randomly reduce the number of observations?


From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: How to (almost) randomly reduce the number of observations?
Date   Tue, 20 Apr 2010 08:13:44 +0000 (GMT)

 --- On Mon, 19/4/10, Dimitrije Tišma wrote:
> > I would like to ask how to reduce number of observations
> > randomly BUT in a way that all observations are kept that
> > are related to the person who still in the dataset. 

--- On Tue, 20/4/10, Maarten buis wrote: 
> *---------- begin example -------------
> // create some example data
> clear
> set obs 100
> gen id = _n
> expand 10
> bys id : gen t = _n
> sort id t
> list in 1/22, sepby(id)
> 
> // randomly drop 50%
> bys id: gen u = runiform() if _n == 1
> bys id: egen uu = total(u)
> keep if uu < .5
> drop u uu
> *----------- end example ----------------

An alternative approach that will sample _with_ replacement:

*---------- begin example -------------
// create some example data
clear
set obs 100
gen id = _n
expand 10
bys id : gen t = _n
sort id t
list in 1/22, sepby(id)

// randomly drop 50% with replacement
bys id: gen byte mark = _n==1
count if mark
local n = round(r(N)/2)
bsample `n', cluster(id)
*----------- end example ----------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------




      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index