Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: More efficient processing of nested loops?


From   Robson Glasscock <glasscockrc@vcu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: More efficient processing of nested loops?
Date   Wed, 15 Aug 2012 11:26:03 -0400

Hello,
I am running Stata 12. I have written code that creates a variable
that is the sum of abnormal returns for each firm. The abnormal
returns are accumulated from three days after the firm makes its
earnings announcement in quarter t-4 to three days after the firm
makes its earnings announcement in quarter t. My problem is that it
takes around 2 minutes for my code to execute for each firm/quarter
announcement, and there are around 150,000 earnings announcements in
the dataset.

I modified a panel dataset with the earnings announcement dates for
each firm so that each observation contains both the quarter t
earnings announcement date and the quarter t-4 earnings announcement
date (with the three days added to each per above). I then merged this
dataset with a second panel dataset that contains the daily returns
for each firm. The merged dataset has around 10.2 million
observations.

Next, I created a count variable, cum_id_cum, which is a running total
of each earnings announcement. This variable is truly cumulative and
does not reset back to 1 when the next firm releases its first
earnings announcement. The loop contains a variable, runn, that starts
with a value equal to "1" and increases by 1 each time the loop is
processed. I'm using that to help identify the particular dates of the
quarter t and quarter t-4 earnings announcement so that the abnormal
returns are accumulated over the correct period. Datadate_2 is the
quarter t earnings announcement and  date_t_4 is the quarter t-4
earnings announcement. Cusip is the identifier for each firm. Date is
the date of the firm's return in the stock market for each trading
day.

The big picture of my approach was to create temporary variables that
will equal the cusip, datadate_2, and date_t_4 when runn equals
cum_id_cum. These first-step temporary variables (d2_temp, dt_temp,
and cusip_temp) have missing values except in the observation where
runn equals cum_id_cum so I created second-step temporary variables
(cusip_temp_2, d2_temp_2, and dt_temp_2) which place the cusip,
datadate_2, and date_t_4 values for each particular run through the
entire dataset. This allows me to mark the days for each firm that are
between the dates of the earnings announcements and then sum up the
abnormal returns in a temporary variable called cum_ab_temp. The final
variable with the sum of the abnormal returns for each firm is cum_ab
and is retained in the observation where _merge==3 (from the merge
mentioned in the second paragraph of this post).

My code is below. Note that I constrain it to the first 25,000
cum_id_cum values due to macro size constraints:

gen runn= 1
levelsof cum_id_cum if cum_id_cum !=. & cum_id_cum <25000, local(cum_id_cum)
foreach 1 of local cum_id_cum{
gen d2_temp= datadate_2 if runn== cum_id_cum
gen dt_temp= date_t_4 if runn== cum_id_cum
gen cusip_temp= cusip if runn== cum_id_cum
egen cusip_temp_2= mode(cusip_temp)
egen d2_temp_2= min(d2_temp)
egen dt_temp_2= min(dt_temp)
foreach x of varlist date{
replace mark= 1 if `x' <= d2_temp_2 & `x' >= dt_temp_2 & cusip_temp_2==cusip
egen cum_ab_temp= total(ab_ret) if mark==1
replace cum_ab= cum_ab_temp if datadate_2== d2_temp & dt_temp==
date_t_4 & cusip_temp==cusip & d2_temp !=. & dt_temp !=.
replace runn= runn+1
drop d2_temp
drop dt_temp
drop d2_temp_2
drop dt_temp_2
drop cusip_temp
drop cusip_temp_2
drop cum_ab_temp
replace mark=0

}
}

I'm wondering if there is a more efficient way to do the above which
will result in a significantly faster processing time. My fear is that
the above ignores the functionality of Stata and instead uses
inefficient brute force.

Thank you for your consideration,
Robson Glasscock
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index