Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Longitudinal sampling (many waves)

From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Longitudinal sampling (many waves)
Date   Mon, 19 Mar 2012 16:33:42 -0400

Oops! 1 in 200 is likely not to pick anything in the auto data set.  Try 1 in 5 instead:
samplesys 5

In case there are few late-entrants to the panel, a simple random sample in each time period after initial enrollment might not feasible.   If that's so, order new entrants by strata (optionally) and entry date; then take a systematic sample. 

***************CODE BEGINS*******************
capture program drop _all

program define samplesys, rclass
/* Draw 1 in k systematic sample: syntax "samplesys k" */
syntax anything [if ] [in]
args k
marksample touse
confirm integer number `k'
tempname start
scalar `start' = ceil(`k'*runiform())
keep if mod(_n-`start',`k')==0 & `touse'
return scalar start = `start'

sysuse auto, clear
gen entry_date = price

sort foreign entry_date
set seed 833332
samplesys 200

***************CODE ENDS*******************


Laurie Molina:

You asked if there is a command for longitudinal sampling. Brendan's suggestion is valid. Randomly sample units who enter in the same time period and follow them for as long as they remain in the database. There will be no distinction between the sample in T+10 and the one in T+11.

Sample with fixed probability of 1 in k to assure equal weights for the cross-sectional analyses.  Start  a dataset containing for each individual their study ID and entry month. To take, e.g., a 1 in 200 sample:

set seed [YOUR CHOICE]
sample 0.5 , by(entry_period) 

You can add important strata to the by() clause.

For longitudinal analyses, no simple sampling plan will compensate for attrition.  You would have to take in each period a new sample of continuing units who "resemble" those lost in the period with respect to entry date and other characteristics.  Re-weighting the continuing sample members is preferable, I think; propensity score and weighting class approaches are both popular.

[email protected]


This is essentially the same question that you posted on February 22. You've either missed or ignored Brendan Halpin's excellent response at

[email protected]

On Mar 15, 2012, at 1:12 PM, Laurie Molina wrote:

Hi guys,
I was wondering if stata has any command for longitudinal sampling.

I have a database which allows me to follow the same observations over
time. However, some of these observations dissapear over time, and
some new observations appear as well, as time goes by.
I would like to take a sample that is representative at every period
of time, that captures the attrition rate of the population, as well
as the rate of new observations entry.
Finally, i would like to be able to update this sample over time, that
is: if i have a sample that satisfies the above requirements from time
T to time T+10, in time T+11 i want to be able to take a new sample,
only on time T+11 observations, and add this new sample to the T to
T+10 sample database, and make sure that it still satisfies (taking in
to account the new T+11 observations), all the above requirements.

Do you think that it is possible to do such kind of sampling in Stata?

Thank you all very much!
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index