Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Merging to the CRSP header file


From   Mark Lunt <mark.lunt@manchester.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Merging to the CRSP header file
Date   Thu, 17 Apr 2008 16:32:53 +0100

Malcolm Wardlaw wrote:
I have a data manipulation question about 1-to-many merging based on tickers and date ranges. It's similar to a previous question, but a much simpler operation and a much more common operation for me. I thought I had solved this problem, but I can't for the life of me figure it out again.
Basically I have lots of observations by date and ticker in Dataset(A) . Tickers are only unique for companies in the CRSP header file for specific date ranges. The CRSP header file provides for a completely unique company identifier, matching the ID up to the ticker symbol and providing a <start> and <end> date for the period that the ticker is a valid match for that ID. So, I need to merge using the ticker, where the date in Dataset(A) is in between the <start> and <end> dates.

Someone had suggested -nearmrg-, which kind of works, but it seems a bit squirrely for what I'm doing. Plus, I'm unfortunately still on Stata 9. I think I read some comment on the archives somewhere about creating 'bins', but I couldn't tell what they were talking about.

This seems like such a common problem, I figured there must be a stock way to handle this.
I've written an ado-file called tvc_merge that could be helpful here. It was designed for merging files of covariates that change with time, which is not quite what you have here. However, I think that the commands

use <Dataset(A)>
tvc_merge ticker start stop using <CRSP Header File>

should produce a file containing all of the records from Dataset(A) matched the correct entry from the CRSP header file, although the start and stop dates will have changed (the ado-file was designed for survival analysis, so it effectively splits the CRSP file at every time that an event occurs in Dataset(A)). Depending on what data you need from the header file, that may not be a problem: if it is, you could always generate new variables

gen xstart = start
gen xstop = stop

before running tvc_merge: they will be unaffected by the merge.

If you think it would be useful, you can get it from

net from http://personalpages.manchester.ac.uk/staff/mark.lunt

and click on the blue "tvc_merge". If you try it, I would be grateful for feedback as to how it behaved in this context.

Thanks

Mark


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index