Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: differences between merge and joinby


From   "Sebastian Kruk" <[email protected]>
To   [email protected]
Subject   st: differences between merge and joinby
Date   Tue, 11 Mar 2008 19:13:15 -0300

Dear statalist users,


I have two dataset, A and B.

A) number, sex, age, citizen

B) number, sex, civil status, children

I have to form a new dataset but number and sex do not uniquely
identify observations.
Number of observacions of A can be >,< or = Number of observations of B.

What's better, merge or joinby?

Bye,

Sebastian.

2008/3/5, Nick Cox <[email protected]>:
> Two minute comments only:
>
>     local assets=qassets
>
>  this looks wrong: qassets[`i'] ?
>
>     quietly:use compustat if  `qtr'=obsqtr & `sic'=sic3 &
>  qsales<=1.15*`sales'/*
>     */ &
>  qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
>
>  tests for equality are ==, not =
>
>  Malcolm Wardlaw
>
>  I wanted to pose this question to Statalist regarding matching data to a
>
>  range of values instead of exact values.  I kind of asked this question
>  before, but I realized from the response that my question was somewhat
>  ill formed, so I'll try to be as explicit as possible.  I will use an
>  example to illustrate the question.
>
>  Let's say I want to do a long-run event study on the changes in real
>  growth of companies.  In order to do this, I need to appropriately match
>
>  the company I am running the event study on to a group of comparable
>  companies.  For this, I need a matched dataset of all companies that
>  match in a range of accounting variables.
>
>  The match occurs as follows.  I have a data set (1) containing all of
>  the companies I wish to perform the event study on.  I need to then
>  create a dataset (2) that contains matching companies from a dataset of
>  the larger Compustat universe of all firms (3).  To do this, I need to
>  gather all firms that have the same SIC code, sales that are between 15%
>
>  and -15% of the event company, and assets that are between 20% and -20%
>  of the event company in the quarter of the event.  The new dataset must
>  also have a marker for each of these group of sample firms that
>  corresponds to the event firm.
>
>  Here is how I originally dealt with the problem. In the program, Stata
>  is continually cycling through the data, loading part of another dataset
>
>  into memory, appending it to another dataset from disk, saving that
>  dataset to disk, and then reloading the original dataset from disk each
>  time.  It works, but it seems very inefficient.
>
>  Is there a best practice on how to do this, or is this basically as good
>
>  as it's going to get?
>
>  ---------------------------------------
>  local num = _N
>  forval i = 1/`num' {
>     /*The sales of Event Company i*/
>     local sales=sales[`i']
>     /*The quarter of the observation*/
>     local qtr=eventquarter[`i']
>     /*SIC code*/
>     local sic=sic3[`i']
>     /*Assets of the event company*/
>     local assets=qassets
>     /*A code that uniquely tags the event*/
>     local code=code[`i']
>     quietly:use compustat if  `qtr'=obsqtr & `sic'=sic3 &
>  qsales<=1.15*`sales'/*
>     */ &
>  qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
>     gen code=`code'
>     append using comparables
>     quietly:save comparables,replace
>     use events
>  }
>
>  *
>  *   For searches and help try:
>  *   http://www.stata.com/support/faqs/res/findit.html
>  *   http://www.stata.com/support/statalist/faq
>  *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index