Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: differences between merge and joinby


From   "Vladimir Vakhitov" <vvakhitov@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: differences between merge and joinby
Date   Tue, 11 Mar 2008 18:53:28 -0400

It depends on what you would like to get at the end. The first
question is "what does uniquely identifies your observations?" What is
an observation, then?

-Merge- is used when you want to add some extra characteristics to
your observations (for example, add a region characteristics to each
observation based on its region ID).
-standby- forms a set of all possible intersections, kinda like INNER
JOIN in Access. Sometimes you don't want this because you get too many
messy intersections.



Volodymyr


2008/3/11, Sebastian Kruk <residuo.solow@gmail.com>:
> Dear statalist users,
>
>
>  I have two dataset, A and B.
>
>  A) number, sex, age, citizen
>
>  B) number, sex, civil status, children
>
>  I have to form a new dataset but number and sex do not uniquely
>  identify observations.
>  Number of observacions of A can be >,< or = Number of observations of B.
>
>  What's better, merge or joinby?
>
>  Bye,
>
>  Sebastian.
>
>  2008/3/5, Nick Cox <n.j.cox@durham.ac.uk>:
>  > Two minute comments only:
>  >
>  >     local assets=qassets
>  >
>  >  this looks wrong: qassets[`i'] ?
>  >
>  >     quietly:use compustat if  `qtr'=obsqtr & `sic'=sic3 &
>  >  qsales<=1.15*`sales'/*
>  >     */ &
>  >  qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
>  >
>  >  tests for equality are ==, not =
>  >
>  >  Malcolm Wardlaw
>  >
>  >  I wanted to pose this question to Statalist regarding matching data to a
>  >
>  >  range of values instead of exact values.  I kind of asked this question
>  >  before, but I realized from the response that my question was somewhat
>  >  ill formed, so I'll try to be as explicit as possible.  I will use an
>  >  example to illustrate the question.
>  >
>  >  Let's say I want to do a long-run event study on the changes in real
>  >  growth of companies.  In order to do this, I need to appropriately match
>  >
>  >  the company I am running the event study on to a group of comparable
>  >  companies.  For this, I need a matched dataset of all companies that
>  >  match in a range of accounting variables.
>  >
>  >  The match occurs as follows.  I have a data set (1) containing all of
>  >  the companies I wish to perform the event study on.  I need to then
>  >  create a dataset (2) that contains matching companies from a dataset of
>  >  the larger Compustat universe of all firms (3).  To do this, I need to
>  >  gather all firms that have the same SIC code, sales that are between 15%
>  >
>  >  and -15% of the event company, and assets that are between 20% and -20%
>  >  of the event company in the quarter of the event.  The new dataset must
>  >  also have a marker for each of these group of sample firms that
>  >  corresponds to the event firm.
>  >
>  >  Here is how I originally dealt with the problem. In the program, Stata
>  >  is continually cycling through the data, loading part of another dataset
>  >
>  >  into memory, appending it to another dataset from disk, saving that
>  >  dataset to disk, and then reloading the original dataset from disk each
>  >  time.  It works, but it seems very inefficient.
>  >
>  >  Is there a best practice on how to do this, or is this basically as good
>  >
>  >  as it's going to get?
>  >
>  >  ---------------------------------------
>  >  local num = _N
>  >  forval i = 1/`num' {
>  >     /*The sales of Event Company i*/
>  >     local sales=sales[`i']
>  >     /*The quarter of the observation*/
>  >     local qtr=eventquarter[`i']
>  >     /*SIC code*/
>  >     local sic=sic3[`i']
>  >     /*Assets of the event company*/
>  >     local assets=qassets
>  >     /*A code that uniquely tags the event*/
>  >     local code=code[`i']
>  >     quietly:use compustat if  `qtr'=obsqtr & `sic'=sic3 &
>  >  qsales<=1.15*`sales'/*
>  >     */ &
>  >  qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
>  >     gen code=`code'
>  >     append using comparables
>  >     quietly:save comparables,replace
>  >     use events
>  >  }
>  >
>  >  *
>  >  *   For searches and help try:
>  >  *   http://www.stata.com/support/faqs/res/findit.html
>  >  *   http://www.stata.com/support/statalist/faq
>  >  *   http://www.ats.ucla.edu/stat/stata/
>  >
>  *
>  *   For searches and help try:
>  *   http://www.stata.com/support/faqs/res/findit.html
>  *   http://www.stata.com/support/statalist/faq
>  *   http://www.ats.ucla.edu/stat/stata/
>


-- 
__________________
Volodymyr Vakhitov
vvakhitov@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index