[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Finding controls within 5 years of age of a case

From	"Kieran McCaul" <[email protected]>
To	<[email protected]>
Subject	st: RE: Finding controls within 5 years of age of a case
Date	Thu, 1 Oct 2009 07:00:18 +0800

...


I can't help with the code for this, but I believe that it's called
calliper matching and this form of age matching can induce some bias in
a study.  The age distribution in the population is not uniform.  This
means that for a particular age, the proportion of the population within
5 years above that age will be different to the proportion who are
within 5 years younger than that age.

For example, suppose a case is 45 years old and the proportion of the
population aged 45 to 49 is greater than the proportion of the
population that is aged 40 to 45.  If you calliper match a large number
of cases aged in their 40s, the controls will tend, on average, to be
older than the cases.

If you match on 5-year age groups, this doesn't happen.  

I haven't got the book with me at the moment, but I think that Rothman
and Greenland discuss this in Modern Epidemiology.

______________________________________________
Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2701
Fax: (08) 9224 8009
email: [email protected]
http://myprofile.cos.com/mccaul 
http://www.researcherid.com/rid/B-8751-2008
______________________________________________
If you live to be one hundred, you've got it made.
Very few people die past that age - George Burns


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael
McCulloch
Sent: Wednesday, 30 September 2009 11:40 PM
To: [email protected]
Subject: st: Finding controls within 5 years of age of a case

Hello Statalist friends,
I have a list of cases and controls uniquely identified by an ID
variable. I'd like to list all controls within 5 years of age of each
case, and then mark those controls with a new variable identifying the
match. This creates, in effect, a cluster that will be accounted for in
analysis.

Is there an efficient way to create that matching variable?  

Michael McCulloch
124 Pine St
San Anselmo, CA 94960
Tel. 415-407-1357
Fax 206-338-2391
[email protected]

On Sep 30, 2009, at 8:27 AM, "Brian R. Landy" <[email protected]>
wrote:

Hi, I'm not really sure what your question is, but I'm guessing you  
find rolling: to be slow with a panel?  I observed this a while back  
(and did report to Stata but have never seen notice that it was  
fixed), I found that -rolling- in conjunction with panels is far  
slower than the time implied by (# panels)*(time for rolling  
regression on just one panel).  In my case a regression was taking  
over 1 hour on a 4 CPU box, this was for somewhere around 100 panels,  
4 years of daily data, and a 2 year rolling regression.

My workaround was to use foreach to loop over the panels, saving and  
merging the results of each somewhat like this:

    // prep data
    tsset id date
    gen end=date // for later merging
    tempfile stats
    levelsof id, local(ids)
    foreach id of local ids {
        keep if id==`id'
        quietly: rolling, window(`window') saving(`stats', replace) ///
                 nodots: regress y x
        merge id end using "`stats'", sort update replace nokeep
        drop _merge
     }

This took my 1+ hour runtime down to just a few minutes.

Regards,
Brian

Quoting Degas Wright <[email protected]>:

I have a longitudinal dataset that has 2000 stocks as xticker (id) and
dependent variable, return (t+1), with 20 independent variables (t) over
88
periods (months).

I am trying to run a , xtreg, regression over three periods and then use
the
coefficients from the regression to forecast the t+1 return.  When I use
the
following command:

.. rolling _b _se, window (3) clear: xtreg return, var1, var2,.var20,
vce(cluster xticker)

(running regress on estimation sample)

-> xticker = 1

Rolling replications (86)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..........

-> xticker = 2

Rolling replications (86)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.........

It starts going through each of the 2000 stocks, by listing xticker1,
xticker 2, etc..  I have stopped it prior to the run being completed
because
it will take a long time to go through all 2000 stocks.

Is there another command that I should be using?  For instance I use the
forvalues command to run the regression, xtreg, one period at a time for
all
of the periods, Period 1, Period 2, etc.

Thank you for your assistance.




Degas A. Wright, CFA
Chief Investment Officer
Decatur Capital Management, Inc.
250 East Ponce De Leon Avenue, Suite 325
Decatur, Georgia  30030
Voice: 404.270.9838
Fax:404.270.9840
Website: www.decaturcapital.com





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Finding controls within 5 years of age of a case
  - From: Michael McCulloch <[email protected]>

Prev by Date: st: re: one-sided p-value using test x1=x2
Next by Date: st: Why include leads (in a model with lags)?
Previous by thread: st: Finding controls within 5 years of age of a case
Next by thread: st: Looping over values of a string variable to create twoways
Index(es):
- Date
- Thread