[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Vladimir Vakhitov" <vvakhitov@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Range Merging |

Date |
Wed, 5 Mar 2008 20:36:42 -0500 |

Malcolm, This is just a wild idea, but if you pursue it you will have to load all data into the memory only once.On the other hand, you will still have to go over each firm/observation in your Event Study data. As far as I get it, you want to assign each Event Study (ES) company a range of corresponding Compustat (CS) companies. In other words, it is possible that the same CS company will have more than one "match" with ES set. Why don't you then create a set of dummies (or a long macro list, if the set of dummies is going to be too high) where each variable (or macro list element) will be 0 or 1 depending on the CS company having a match with a corresponding ES company. To do that, you may -append- your ES dataset to the corresponding CS dataset, ID ES companies with -_n- and go over all N ES firms in a cycle i=1/N. At every run of the cycle, create a temporary binary variable ("match") which would determine if each observation in the entire dataset falls into the given range for a company i (this should work fast enough since this is just a line of -if' conditions). And then store this "match" variable in a a macro list for each CS company, or make a matrix, or a dummy variables. After the cycle ends, drop ES companies. At the end, you will have a big sheet with zeros or ones, and the position `i' in this sheet will determine if you CS company had a match with the corresponding ES company `i'. Before running the cycle you may want to establish limits for CS dataset and drop all the companies for which the match will never be possible to achieve. This check-up procedure may be based on the frontiers of your ES variables. If you have more than 32K+ companies (observations, actually) in your ES sample, then you will need to use more than one macro list. However, in this case the program will probably work very slowly and require a lot of memory. Another way is to store those "match" variables in some text files using outsheet. It also works fast and does not require reloading the data. 2008/3/5, Malcolm Wardlaw <malcolm@mail.utexas.edu>: > I wanted to pose this question to Statalist regarding matching data to a > range of values instead of exact values. I kind of asked this question > before, but I realized from the response that my question was somewhat > ill formed, so I'll try to be as explicit as possible. I will use an > example to illustrate the question. > > Let's say I want to do a long-run event study on the changes in real > growth of companies. In order to do this, I need to appropriately match > the company I am running the event study on to a group of comparable > companies. For this, I need a matched dataset of all companies that > match in a range of accounting variables. > > The match occurs as follows. I have a data set (1) containing all of > the companies I wish to perform the event study on. I need to then > create a dataset (2) that contains matching companies from a dataset of > the larger Compustat universe of all firms (3). To do this, I need to > gather all firms that have the same SIC code, sales that are between 15% > and -15% of the event company, and assets that are between 20% and -20% > of the event company in the quarter of the event. The new dataset must > also have a marker for each of these group of sample firms that > corresponds to the event firm. > > Here is how I originally dealt with the problem. In the program, Stata > is continually cycling through the data, loading part of another dataset > into memory, appending it to another dataset from disk, saving that > dataset to disk, and then reloading the original dataset from disk each > time. It works, but it seems very inefficient. > > Is there a best practice on how to do this, or is this basically as good > as it's going to get? > > --------------------------------------- > local num = _N > forval i = 1/`num' { > /*The sales of Event Company i*/ > local sales=sales[`i'] > /*The quarter of the observation*/ > local qtr=eventquarter[`i'] > /*SIC code*/ > local sic=sic3[`i'] > /*Assets of the event company*/ > local assets=qassets > /*A code that uniquely tags the event*/ > local code=code[`i'] > quietly:use compustat if `qtr'=obsqtr & `sic'=sic3 & > qsales<=1.15*`sales'/* > */ & > qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear > gen code=`code' > append using comparables > quietly:save comparables,replace > use events > } > --------------------------------------- > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- __________________ Volodymyr Vakhitov vvakhitov@gmail.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Range Merging***From:*Malcolm Wardlaw <malcolm@mail.utexas.edu>

- Prev by Date:
**Re: st: Range Merging** - Next by Date:
**Re: st: Range Merging** - Previous by thread:
**Re: st: Range Merging** - Next by thread:
**Re: st: Range Merging** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |