Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging two datasets: companies with daily stock data


From   Dan Blanchette <[email protected]>
To   [email protected]
Subject   Re: st: merging two datasets: companies with daily stock data
Date   Thu, 11 Nov 2010 10:07:15 -0500 (EST)

Would this work for you?

 merge 1:1 company_id date using "daily market data.dta" ///
   keep(master match)

This will keep all the observations of the master data
and only keep data from the merging dataset that matches
by company_id and date.

HTH,
Dan Blanchette
Research Associate
Center for Entrepreneurship and Innovation
Duke University's Fuqua School of Business
[email protected]

Hi,

It is possible to load a subset of a dataset. The syntax is:

use [varlist] [if] [in] using filename [, clear nolabel]

See "help use" . In your case, I would think you will need the -if-
qualifier for the date conditions.

On Thu, Nov 11, 2010 at 1:59 AM, Nuno Soares <[email protected]> wrote:
Hi everyone,

I'm trying to implement a bit of code that I normally use in SAS, but I'm
having some trouble to implement in Stata given the magnitude of data that
it requires.
Imagine you have two data files: one with a list of companies with a given
date (say an event date) associated and the other with daily stock market
data. The file with the companies can have the same company with multiple
events. Problem: get the market data for a given period (say event date -
260 days).

This problems seems to be easily solved by using a m:1 merge with the
company ID as the merge variable. This would get all the daily data,
irrespective of the event date, into the merge file and I could then delete
those dates that are not needed.

Now, imagine the company and event date file has 1000 observations all of
which represent firms that have 10 years of daily market data, and I still
just want event date - 260 days. This would mean that the merge process
would lead to a file with 1000*10*260=2600000 observations, of which I only
needed 10%. As both the files become increasingly bigger, the time needed to
merge becomes longer and the memory requirements increase.

In SAS I would only use a proc sql with the date restrictions, and it would
get the data needed. In Stata, it seems that all the daily data file (with
the restriction of the ID companies) is loaded into the memory and then we
need to delete want we don't need. Is there a way of restricting the amount
of daily data loaded into the memory in Stata using the  - merge - command,
or a command that allows to do that? The 2600000 is not that bad, but I
normally encounter 2 or three more times this number of observations...

Best wishes,

Nuno
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index