Malcolm,

As you are aware, the inefficiency comes because you are churning datasets. You may be able to avoid this by putting the "events" data into a matrix, then doing the matching with the "compustat" data current throughout, something like this (Stata 9 approach, better endowed people would probably use Mata):

use events, clear // or whatever it's called

loc num=_N

set matsize `num' // if you have lots of companies

mkmat sales eventquarter sic3 qassets code, mat(E)

drop _all // clears data but not macros or matrices

use compustat

gen code=.

forval i = 1/`num' {

local sales=E[`i',1]

local qtr=E[`i',2]

local sic=E[`i',3]

local assets=E[`i',4]

local code=E[`i',5]

replace code=`code' if ... // your match criteria

}

drop if code==.

save comparables

This does not yet achieve quite what your code does, because it allows each compustat entry to match only one of your companies (the last found) whereas your code allows multiple matches (you can get around that using -expand-, may need two passes). Also it requires all the matching variables to be numeric: if they aren't, you may need to -encode- them. So there's work to do, but I think the basic idea is ok.

Keith

At 07:13 AM 6/03/2008, you wrote:

I wanted to pose this question to Statalist regarding matching data to a range of values instead of exact values. I kind of asked this question before, but I realized from the response that my question was somewhat ill formed, so I'll try to be as explicit as possible. I will use an example to illustrate the question.

Let's say I want to do a long-run event study on the changes in real growth of companies. In order to do this, I need to appropriately match the company I am running the event study on to a group of comparable companies. For this, I need a matched dataset of all companies that match in a range of accounting variables.

The match occurs as follows. I have a data set (1) containing all of the companies I wish to perform the event study on. I need to then create a dataset (2) that contains matching companies from a dataset of the larger Compustat universe of all firms (3). To do this, I need to gather all firms that have the same SIC code, sales that are between 15% and -15% of the event company, and assets that are between 20% and -20% of the event company in the quarter of the event. The new dataset must also have a marker for each of these group of sample firms that corresponds to the event firm.

Here is how I originally dealt with the problem. In the program, Stata is continually cycling through the data, loading part of another dataset into memory, appending it to another dataset from disk, saving that dataset to disk, and then reloading the original dataset from disk each time. It works, but it seems very inefficient.

Is there a best practice on how to do this, or is this basically as good as it's going to get?

---------------------------------------

local num = _N

forval i = 1/`num' {

/*The sales of Event Company i*/

local sales=sales[`i']

/*The quarter of the observation*/

local qtr=eventquarter[`i']

/*SIC code*/

local sic=sic3[`i']

/*Assets of the event company*/

local assets=qassets

/*A code that uniquely tags the event*/

local code=code[`i']

quietly:use compustat if `qtr'=obsqtr & `sic'=sic3 & qsales<=1.15*`sales'/*

*/ & qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear

gen code=`code'

append using comparables

quietly:save comparables,replace

use events

}

---------------------------------------

