Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Maarten buis <maartenbuis@yahoo.co.uk> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to (almost) randomly reduce the number of observations? |
Date | Tue, 20 Apr 2010 08:13:44 +0000 (GMT) |
--- On Mon, 19/4/10, Dimitrije Tišma wrote: > > I would like to ask how to reduce number of observations > > randomly BUT in a way that all observations are kept that > > are related to the person who still in the dataset. --- On Tue, 20/4/10, Maarten buis wrote: > *---------- begin example ------------- > // create some example data > clear > set obs 100 > gen id = _n > expand 10 > bys id : gen t = _n > sort id t > list in 1/22, sepby(id) > > // randomly drop 50% > bys id: gen u = runiform() if _n == 1 > bys id: egen uu = total(u) > keep if uu < .5 > drop u uu > *----------- end example ---------------- An alternative approach that will sample _with_ replacement: *---------- begin example ------------- // create some example data clear set obs 100 gen id = _n expand 10 bys id : gen t = _n sort id t list in 1/22, sepby(id) // randomly drop 50% with replacement bys id: gen byte mark = _n==1 count if mark local n = round(r(N)/2) bsample `n', cluster(id) *----------- end example ---------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/