Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Random Sample Selection in Panel Data
From 
 
Nick Cox <[email protected]> 
To 
 
"'[email protected]'" <[email protected]> 
Subject 
 
st: RE: Random Sample Selection in Panel Data 
Date 
 
Fri, 13 May 2011 14:57:54 +0100 
One way to tackle this is that you perform sample selection on a dataset with one just one identifier per observation. Then you -merge- with the main dataset. 
Equivalently, tag just one observation per identifier, sample within that subset, and then expand to include all observations for each identifier. -egen, max()- is one way to do the expansion. 
In fact 
. search sample, faq  
shows that this is an FAQ, and that you could have identified relevant material directly within Stata, e.g. 
FAQ     . . . . . . . . . . . . . . . . . . Sampling clusters, not individuals
        . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox and S. Merryman
        5/06    How can I sample clusters, not individuals?
                http://www.stata.com/support/faqs/data/sampleby.html
Nick 
[email protected] 
Dennis Kramer
I have a large panel data sets (4 years-- 250,000 + records per year)
and I want to generate four random sample groups to test the stability
of the estimates. However, I want to ensure that if a ID is selected
in Year 1 then are are subsequently selected into the sample random
sample for Years 2, 3, and 4.
I know for a cross-sectional random sampling the code is as follows:
generate rannum = uniform()
egen grp2 = cut(rannum), group(4)
Does anyone have any insight into modifying the above syntax to
automatically include years2, 3, 4, ids in the same sample as the
selected Year 1 ID??
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/