Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Random Sample Selection in Panel Data

From   Stas Kolenikov <>
Subject   Re: st: Random Sample Selection in Panel Data
Date   Fri, 13 May 2011 09:31:00 -0500

On Fri, May 13, 2011 at 8:29 AM, Dennis Kramer <> wrote:
> I have a large panel data sets (4 years-- 250,000 + records per year)
> and I want to generate four random sample groups to test the stability
> of the estimates. However, I want to ensure that if a ID is selected
> in Year 1 then are are subsequently selected into the sample random
> sample for Years 2, 3, and 4.
> I know for a cross-sectional random sampling the code is as follows:
> generate rannum = uniform()
> egen grp2 = cut(rannum), group(4)

bysort id (year) : replace grp2 = grp2[1]

I wouldn't even bother with -egen-, which takes a while with your 1M
observations, and would just

generate byte grp2 = ceil( 4*uniform() )

The groups will be slightly disbalanced, but with 250K observations,
that's barely an issue. You might have problems, if you have a complex
survey structure (PSU/stratum). In that case, it is not quite clear to
me whether you'd want to sample individuals or PSUs or individuals
within PSUs, or what, to test your stability assumption; and besides,
you would need to modify the sampling weights to account for an extra
stage of sampling you introduced.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index