Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Random Sample Selection in Panel Data |
Date | Fri, 13 May 2011 09:31:00 -0500 |
On Fri, May 13, 2011 at 8:29 AM, Dennis Kramer <dkramerii@gmail.com> wrote: > I have a large panel data sets (4 years-- 250,000 + records per year) > and I want to generate four random sample groups to test the stability > of the estimates. However, I want to ensure that if a ID is selected > in Year 1 then are are subsequently selected into the sample random > sample for Years 2, 3, and 4. > > I know for a cross-sectional random sampling the code is as follows: > > generate rannum = uniform() > egen grp2 = cut(rannum), group(4) bysort id (year) : replace grp2 = grp2[1] I wouldn't even bother with -egen-, which takes a while with your 1M observations, and would just generate byte grp2 = ceil( 4*uniform() ) The groups will be slightly disbalanced, but with 250K observations, that's barely an issue. You might have problems, if you have a complex survey structure (PSU/stratum). In that case, it is not quite clear to me whether you'd want to sample individuals or PSUs or individuals within PSUs, or what, to test your stability assumption; and besides, you would need to modify the sampling weights to account for an extra stage of sampling you introduced. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/