Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Random Sample Selection in Panel Data


From   Stas Kolenikov <[email protected]>
To   [email protected]
Subject   Re: st: Random Sample Selection in Panel Data
Date   Fri, 13 May 2011 09:31:00 -0500

On Fri, May 13, 2011 at 8:29 AM, Dennis Kramer <[email protected]> wrote:
> I have a large panel data sets (4 years-- 250,000 + records per year)
> and I want to generate four random sample groups to test the stability
> of the estimates. However, I want to ensure that if a ID is selected
> in Year 1 then are are subsequently selected into the sample random
> sample for Years 2, 3, and 4.
>
> I know for a cross-sectional random sampling the code is as follows:
>
> generate rannum = uniform()
> egen grp2 = cut(rannum), group(4)

bysort id (year) : replace grp2 = grp2[1]

I wouldn't even bother with -egen-, which takes a while with your 1M
observations, and would just

generate byte grp2 = ceil( 4*uniform() )

The groups will be slightly disbalanced, but with 250K observations,
that's barely an issue. You might have problems, if you have a complex
survey structure (PSU/stratum). In that case, it is not quite clear to
me whether you'd want to sample individuals or PSUs or individuals
within PSUs, or what, to test your stability assumption; and besides,
you would need to modify the sampling weights to account for an extra
stage of sampling you introduced.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index