Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Finding controls within 5 years of age of a case


From   Michael McCulloch <gmmt@sbcglobal.net>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Finding controls within 5 years of age of a case
Date   Wed, 30 Sep 2009 08:39:57 -0700 (PDT)

Hello Statalist friends,
I have a list of cases and controls uniquely identified by an ID variable. I'd like to list all controls within 5 years of age of each case, and then mark those controls with a new variable identifying the match. This creates, in effect, a cluster that will be accounted for in analysis.

Is there an efficient way to create that matching variable?  

Michael McCulloch
124 Pine St
San Anselmo, CA 94960
Tel. 415-407-1357
Fax 206-338-2391
mm@pinestreetfoundation.org

On Sep 30, 2009, at 8:27 AM, "Brian R. Landy" <landy@alumni.caltech.edu> wrote:

Hi, I'm not really sure what your question is, but I'm guessing you  
find rolling: to be slow with a panel?  I observed this a while back  
(and did report to Stata but have never seen notice that it was  
fixed), I found that -rolling- in conjunction with panels is far  
slower than the time implied by (# panels)*(time for rolling  
regression on just one panel).  In my case a regression was taking  
over 1 hour on a 4 CPU box, this was for somewhere around 100 panels,  
4 years of daily data, and a 2 year rolling regression.

My workaround was to use foreach to loop over the panels, saving and  
merging the results of each somewhat like this:

    // prep data
    tsset id date
    gen end=date // for later merging
    tempfile stats
    levelsof id, local(ids)
    foreach id of local ids {
        keep if id==`id'
        quietly: rolling, window(`window') saving(`stats', replace) ///
                 nodots: regress y x
        merge id end using "`stats'", sort update replace nokeep
        drop _merge
     }

This took my 1+ hour runtime down to just a few minutes.

Regards,
Brian

Quoting Degas Wright <degasw@decaturcapital.com>:

I have a longitudinal dataset that has 2000 stocks as xticker (id) and
dependent variable, return (t+1), with 20 independent variables (t) over 88
periods (months).

I am trying to run a , xtreg, regression over three periods and then use the
coefficients from the regression to forecast the t+1 return.  When I use the
following command:

.. rolling _b _se, window (3) clear: xtreg return, var1, var2,.var20,
vce(cluster xticker)

(running regress on estimation sample)

-> xticker = 1

Rolling replications (86)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..........

-> xticker = 2

Rolling replications (86)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.........

It starts going through each of the 2000 stocks, by listing xticker1,
xticker 2, etc..  I have stopped it prior to the run being completed because
it will take a long time to go through all 2000 stocks.

Is there another command that I should be using?  For instance I use the
forvalues command to run the regression, xtreg, one period at a time for all
of the periods, Period 1, Period 2, etc.

Thank you for your assistance.




Degas A. Wright, CFA
Chief Investment Officer
Decatur Capital Management, Inc.
250 East Ponce De Leon Avenue, Suite 325
Decatur, Georgia  30030
Voice: 404.270.9838
Fax:404.270.9840
Website: www.decaturcapital.com





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index