... I can't help with the code for this, but I believe that it's called calliper matching and this form of age matching can induce some bias in a study. The age distribution in the population is not uniform. This means that for a particular age, the proportion of the population within 5 years above that age will be different to the proportion who are within 5 years younger than that age. For example, suppose a case is 45 years old and the proportion of the population aged 45 to 49 is greater than the proportion of the population that is aged 40 to 45. If you calliper match a large number of cases aged in their 40s, the controls will tend, on average, to be older than the cases. If you match on 5-year age groups, this doesn't happen. I haven't got the book with me at the moment, but I think that Rothman and Greenland discuss this in Modern Epidemiology.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael McCulloch
Sent: Wednesday, 30 September 2009 11:40 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: Finding controls within 5 years of age of a case

Hello Statalist friends,

I have a list of cases and controls uniquely identified by an ID variable. I'd like to list all controls within 5 years of age of each case, and then mark those controls with a new variable identifying the match. This creates, in effect, a cluster that will be accounted for in analysis.

Is there an efficient way to create that matching variable? My workaround was to use foreach to loop over the panels, saving and merging the results of each somewhat like this: // prep data tsset id date gen end=date // for later merging tempfile stats levelsof id, local(ids) foreach id of local ids { keep if id==`id' quietly: rolling, window(`window') saving(`stats', replace) /// nodots: regress y x merge id end using "`stats'", sort update replace nokeep drop _merge } This took my 1+ hour runtime down to just a few minutes. Regards, Brian Quoting Degas Wright <degasw@decaturcapital.com>: I have a longitudinal dataset that has 2000 stocks as xticker (id) and dependent variable, return (t+1), with 20 independent variables (t) over 88 periods (months). I am trying to run a , xtreg, regression over three periods and then use the coefficients from the regression to forecast the t+1 return. When I use the following command: .. rolling _b _se, window (3) clear: xtreg return, var1, var2,.var20, vce(cluster xticker) (running regress on estimation sample) -> xticker = 1 Rolling replications (86) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .......... -> xticker = 2 Rolling replications (86) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 ......... It starts going through each of the 2000 stocks, by listing xticker1, xticker 2, etc.. I have stopped it prior to the run being completed because it will take a long time to go through all 2000 stocks. Is there another command that I should be using? For instance I use the forvalues command to run the regression, xtreg, one period at a time for all of the periods, Period 1, Period 2, etc. Thank you for your assistance. Degas A. 