Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: merge and joinby


From   Christopher F Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: merge and joinby
Date   Thu, 28 Aug 2003 06:45:20 -0400

On Thursday, August 28, 2003, at 02:33 AM, John wrote:

What is the difference between the way .merge and .joinby work? I've been
using joinby because it appears to work the same way relational databases
do, and I'm familiar with that concept.
That is correct--joinby forms the Cartesian product (outer join), which users of RDBMS are exhorted to avoid at all costs (run a proposed SELECT statement with an outer join by your DBA and see what s/he says). You practically never really want a Cartesian product, which generates a row (observation) for every defined combination of the two sets (in Stataese, the master and using dataset). More usually, you want to somehow match the observations in the using dataset with the master dataset -- with a one-to-one, one-to-many, or many-to-one merge. If you have about the same number of obs. in both datasets it would seem that you're really trying to do a one-to-one merge. joinby will not achieve that, but will generate a huge number of observations in the Cartesian product (about 450^2? 450^2 obs and 47 variables is quite a bit larger than 450 x 45).

Kit

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index