Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Robson Glasscock <glasscockrc@vcu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: More efficient processing of nested loops? |

Date |
Wed, 15 Aug 2012 11:26:03 -0400 |

Hello, I am running Stata 12. I have written code that creates a variable that is the sum of abnormal returns for each firm. The abnormal returns are accumulated from three days after the firm makes its earnings announcement in quarter t-4 to three days after the firm makes its earnings announcement in quarter t. My problem is that it takes around 2 minutes for my code to execute for each firm/quarter announcement, and there are around 150,000 earnings announcements in the dataset. I modified a panel dataset with the earnings announcement dates for each firm so that each observation contains both the quarter t earnings announcement date and the quarter t-4 earnings announcement date (with the three days added to each per above). I then merged this dataset with a second panel dataset that contains the daily returns for each firm. The merged dataset has around 10.2 million observations. Next, I created a count variable, cum_id_cum, which is a running total of each earnings announcement. This variable is truly cumulative and does not reset back to 1 when the next firm releases its first earnings announcement. The loop contains a variable, runn, that starts with a value equal to "1" and increases by 1 each time the loop is processed. I'm using that to help identify the particular dates of the quarter t and quarter t-4 earnings announcement so that the abnormal returns are accumulated over the correct period. Datadate_2 is the quarter t earnings announcement and date_t_4 is the quarter t-4 earnings announcement. Cusip is the identifier for each firm. Date is the date of the firm's return in the stock market for each trading day. The big picture of my approach was to create temporary variables that will equal the cusip, datadate_2, and date_t_4 when runn equals cum_id_cum. These first-step temporary variables (d2_temp, dt_temp, and cusip_temp) have missing values except in the observation where runn equals cum_id_cum so I created second-step temporary variables (cusip_temp_2, d2_temp_2, and dt_temp_2) which place the cusip, datadate_2, and date_t_4 values for each particular run through the entire dataset. This allows me to mark the days for each firm that are between the dates of the earnings announcements and then sum up the abnormal returns in a temporary variable called cum_ab_temp. The final variable with the sum of the abnormal returns for each firm is cum_ab and is retained in the observation where _merge==3 (from the merge mentioned in the second paragraph of this post). My code is below. Note that I constrain it to the first 25,000 cum_id_cum values due to macro size constraints: gen runn= 1 levelsof cum_id_cum if cum_id_cum !=. & cum_id_cum <25000, local(cum_id_cum) foreach 1 of local cum_id_cum{ gen d2_temp= datadate_2 if runn== cum_id_cum gen dt_temp= date_t_4 if runn== cum_id_cum gen cusip_temp= cusip if runn== cum_id_cum egen cusip_temp_2= mode(cusip_temp) egen d2_temp_2= min(d2_temp) egen dt_temp_2= min(dt_temp) foreach x of varlist date{ replace mark= 1 if `x' <= d2_temp_2 & `x' >= dt_temp_2 & cusip_temp_2==cusip egen cum_ab_temp= total(ab_ret) if mark==1 replace cum_ab= cum_ab_temp if datadate_2== d2_temp & dt_temp== date_t_4 & cusip_temp==cusip & d2_temp !=. & dt_temp !=. replace runn= runn+1 drop d2_temp drop dt_temp drop d2_temp_2 drop dt_temp_2 drop cusip_temp drop cusip_temp_2 drop cum_ab_temp replace mark=0 } } I'm wondering if there is a more efficient way to do the above which will result in a significantly faster processing time. My fear is that the above ignores the functionality of Stata and instead uses inefficient brute force. Thank you for your consideration, Robson Glasscock * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: More efficient processing of nested loops?***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: -gmm- and -outreg2-** - Next by Date:
**RE: st: Use Regular Expressions to replace words/strings of characters in a text file** - Previous by thread:
**st: Sensitivity analysis of relationship between study quality and effect size** - Next by thread:
**Re: st: More efficient processing of nested loops?** - Index(es):