Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nuno Soares" <ndsoares@gmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: merging two datasets: companies with daily stock data |
Date | Wed, 10 Nov 2010 17:59:45 -0000 |
Hi everyone, I'm trying to implement a bit of code that I normally use in SAS, but I'm having some trouble to implement in Stata given the magnitude of data that it requires. Imagine you have two data files: one with a list of companies with a given date (say an event date) associated and the other with daily stock market data. The file with the companies can have the same company with multiple events. Problem: get the market data for a given period (say event date - 260 days). This problems seems to be easily solved by using a m:1 merge with the company ID as the merge variable. This would get all the daily data, irrespective of the event date, into the merge file and I could then delete those dates that are not needed. Now, imagine the company and event date file has 1000 observations all of which represent firms that have 10 years of daily market data, and I still just want event date - 260 days. This would mean that the merge process would lead to a file with 1000*10*260=2600000 observations, of which I only needed 10%. As both the files become increasingly bigger, the time needed to merge becomes longer and the memory requirements increase. In SAS I would only use a proc sql with the date restrictions, and it would get the data needed. In Stata, it seems that all the daily data file (with the restriction of the ID companies) is loaded into the memory and then we need to delete want we don't need. Is there a way of restricting the amount of daily data loaded into the memory in Stata using the - merge - command, or a command that allows to do that? The 2600000 is not that bad, but I normally encounter 2 or three more times this number of observations... Best wishes, Nuno * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/