[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Range Merging

From   Keith Dear <>
Subject   Re: st: Range Merging
Date   Thu, 06 Mar 2008 10:45:36 +1100

As you are aware, the inefficiency comes because you are churning datasets. You may be able to avoid this by putting the "events" data into a matrix, then doing the matching with the "compustat" data current throughout, something like this (Stata 9 approach, better endowed people would probably use Mata):

use events, clear // or whatever it's called
loc num=_N
set matsize `num' // if you have lots of companies
mkmat sales eventquarter sic3 qassets code, mat(E)
drop _all // clears data but not macros or matrices
use compustat
gen code=.
forval i = 1/`num' {
local sales=E[`i',1]
local qtr=E[`i',2]
local sic=E[`i',3]
local assets=E[`i',4]
local code=E[`i',5]
replace code=`code' if ... // your match criteria
drop if code==.
save comparables

This does not yet achieve quite what your code does, because it allows each compustat entry to match only one of your companies (the last found) whereas your code allows multiple matches (you can get around that using -expand-, may need two passes). Also it requires all the matching variables to be numeric: if they aren't, you may need to -encode- them. So there's work to do, but I think the basic idea is ok.

At 07:13 AM 6/03/2008, you wrote:

I wanted to pose this question to Statalist regarding matching data to a range of values instead of exact values. I kind of asked this question before, but I realized from the response that my question was somewhat ill formed, so I'll try to be as explicit as possible. I will use an example to illustrate the question.

Let's say I want to do a long-run event study on the changes in real growth of companies. In order to do this, I need to appropriately match the company I am running the event study on to a group of comparable companies. For this, I need a matched dataset of all companies that match in a range of accounting variables.

The match occurs as follows. I have a data set (1) containing all of the companies I wish to perform the event study on. I need to then create a dataset (2) that contains matching companies from a dataset of the larger Compustat universe of all firms (3). To do this, I need to gather all firms that have the same SIC code, sales that are between 15% and -15% of the event company, and assets that are between 20% and -20% of the event company in the quarter of the event. The new dataset must also have a marker for each of these group of sample firms that corresponds to the event firm.

Here is how I originally dealt with the problem. In the program, Stata is continually cycling through the data, loading part of another dataset into memory, appending it to another dataset from disk, saving that dataset to disk, and then reloading the original dataset from disk each time. It works, but it seems very inefficient.

Is there a best practice on how to do this, or is this basically as good as it's going to get?

local num = _N
forval i = 1/`num' {
/*The sales of Event Company i*/
local sales=sales[`i']
/*The quarter of the observation*/
local qtr=eventquarter[`i']
/*SIC code*/
local sic=sic3[`i']
/*Assets of the event company*/
local assets=qassets
/*A code that uniquely tags the event*/
local code=code[`i']
quietly:use compustat if `qtr'=obsqtr & `sic'=sic3 & qsales<=1.15*`sales'/*
*/ & qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
gen code=`code'
append using comparables
quietly:save comparables,replace
use events
* For searches and help try:

Dr Keith B.G. Dear
Senior Fellow in Biostatistics
National Centre for Epidemiology and Population Health
Australian National University
Canberra, ACT 0200, Australia
Tel: 02 612 54865, Fax: 02 612 50740
CRICOS provider #00120C

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index