[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Merging to the CRSP header file

From   "Gabi Huiber" <>
Subject   Re: st: Merging to the CRSP header file
Date   Thu, 17 Apr 2008 11:55:45 -0400

In case Mark's command isn't exactly what you need (I think it is),
below is how I would go about it if I had to do it from scratch. It
won't be the most economical thing, but I like breaking up operations
in blocks that make things easy for people to follow. I would do two
separate merges.

Say I have a file of average daily stock prices, A, with three
variables: ticker symbol, stockprice, date. And I have a CRSP header
file, B, with four variables: company CRSP ID, ticker symbol, date
ticker symbol starts, date ticker symbol ends. That way the same
company name may be associated with several ticker symbols.

I would first collect a list of the ticker symbols in A:

keep ticker
duplicates drop
sort ticker
save Atickers, replace

Then I would merge it with the B file, and

keep if _merge==3
keep ticker crsp_id
sort ticker
save Atickers, replace

That would give me the list of crsp_id's in the A file, which is
shorter than or equal to the list of ticker symbols:

tab crsp_id ticker

At this point I would take each crsp_id separately and merge it again
to the A file by ticker. Then keep if _merge==3 would give me the
stock prices of only that company regardless of how its ticker symbol
may have changed over that time span.

I would then append these individual company files and drop the ticker
symbols because now I have company_id's instead, and that's all I

If you have a reasonable number of companies you care about, this
merge-keep-merge-keep-append succession won't take Stata very long to
complete. A do-file that would provide a general solution -- in case
the list of companies in file A changes frequently -- would be easy to


On 4/17/08, Malcolm Wardlaw <> wrote:
> I have a data manipulation question about 1-to-many merging based on tickers
> and date ranges.  It's similar to a previous question, but a much simpler
> operation and a much more common operation for me.  I thought I had solved
> this problem, but I can't for the life of me figure it out again.
>  Basically I have lots of observations by date and ticker in Dataset(A) .
> Tickers are only unique for companies in the CRSP header file for specific
> date ranges.  The CRSP header file provides for a completely unique company
> identifier, matching the ID up to the ticker symbol and providing a <start>
> and <end> date for the period that the ticker is a valid match for that ID.
> So, I need to merge using the ticker, where the date in Dataset(A) is in
> between the <start> and <end> dates.
>  Someone had suggested -nearmrg-, which kind of works, but it seems a bit
> squirrely for what I'm doing.  Plus, I'm unfortunately still on Stata 9.  I
> think I read some comment on the archives somewhere about creating 'bins',
> but I couldn't tell what they were talking about.
>  This seems like such a common problem, I figured there must be a stock way
> to handle this.
>  *
>  *   For searches and help try:
>  *
>  *
>  *
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index