Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Range Merging


From   "Vladimir Vakhitov" <vvakhitov@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Range Merging
Date   Wed, 5 Mar 2008 20:36:42 -0500

Malcolm,

This is just a wild idea, but if you pursue it you will have to load
all data into the memory only once.On the other hand, you will still
have to go over each firm/observation in your Event Study data.

As far as I get it, you want to assign each Event Study (ES) company a
range of corresponding Compustat (CS) companies. In other words, it is
possible that the same CS company will have more than one "match" with
ES set. Why don't you then create a set of dummies (or a long macro
list, if the set of dummies is going to be too high) where each
variable (or macro list element) will be 0 or 1 depending on the CS
company having a match with a corresponding ES company.
To do that, you may -append- your ES dataset to the corresponding CS
dataset, ID ES companies with -_n- and go over all  N ES firms in a
cycle i=1/N. At every run of the cycle, create a temporary binary
variable ("match") which would determine if each observation in the
entire dataset falls into the given range for a company i (this should
work fast enough since this is just a line of -if' conditions). And
then store this "match" variable in a a macro list for each CS
company, or make a matrix, or a dummy variables. After the cycle ends,
drop ES companies. At the end, you will have a big sheet with zeros or
ones, and the position `i' in this sheet will determine if you CS
company had a match with the corresponding ES company `i'.
Before running the cycle you may want to establish limits for CS
dataset and drop all the companies for which the match will never be
possible to achieve. This check-up procedure may be based on the
frontiers of your ES variables.
If you have more than 32K+ companies (observations, actually) in your
ES sample, then you will need to use more than one macro list.
However, in this case the program will probably work very slowly and
require a lot of memory.
Another way is to store those "match" variables in some text files
using outsheet. It also works fast and does not require reloading the
data.

2008/3/5, Malcolm Wardlaw <malcolm@mail.utexas.edu>:
> I wanted to pose this question to Statalist regarding matching data to a
>  range of values instead of exact values.  I kind of asked this question
>  before, but I realized from the response that my question was somewhat
>  ill formed, so I'll try to be as explicit as possible.  I will use an
>  example to illustrate the question.
>
>  Let's say I want to do a long-run event study on the changes in real
>  growth of companies.  In order to do this, I need to appropriately match
>  the company I am running the event study on to a group of comparable
>  companies.  For this, I need a matched dataset of all companies that
>  match in a range of accounting variables.
>
>  The match occurs as follows.  I have a data set (1) containing all of
>  the companies I wish to perform the event study on.  I need to then
>  create a dataset (2) that contains matching companies from a dataset of
>  the larger Compustat universe of all firms (3).  To do this, I need to
>  gather all firms that have the same SIC code, sales that are between 15%
>  and -15% of the event company, and assets that are between 20% and -20%
>  of the event company in the quarter of the event.  The new dataset must
>  also have a marker for each of these group of sample firms that
>  corresponds to the event firm.
>
>  Here is how I originally dealt with the problem. In the program, Stata
>  is continually cycling through the data, loading part of another dataset
>  into memory, appending it to another dataset from disk, saving that
>  dataset to disk, and then reloading the original dataset from disk each
>  time.  It works, but it seems very inefficient.
>
>  Is there a best practice on how to do this, or is this basically as good
>  as it's going to get?
>
>  ---------------------------------------
>  local num = _N
>  forval i = 1/`num' {
>     /*The sales of Event Company i*/
>     local sales=sales[`i']
>     /*The quarter of the observation*/
>     local qtr=eventquarter[`i']
>     /*SIC code*/
>     local sic=sic3[`i']
>     /*Assets of the event company*/
>     local assets=qassets
>     /*A code that uniquely tags the event*/
>     local code=code[`i']
>     quietly:use compustat if  `qtr'=obsqtr & `sic'=sic3 &
>  qsales<=1.15*`sales'/*
>     */ &
>  qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
>     gen code=`code'
>     append using comparables
>     quietly:save comparables,replace
>     use events
>  }
>  ---------------------------------------
>  *
>  *   For searches and help try:
>  *   http://www.stata.com/support/faqs/res/findit.html
>  *   http://www.stata.com/support/statalist/faq
>  *   http://www.ats.ucla.edu/stat/stata/
>


-- 
__________________
Volodymyr Vakhitov
vvakhitov@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index