Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tricky data merge/joinby problem

From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: tricky data merge/joinby problem
Date   Fri, 04 Mar 2011 11:28:04 -0500


I still think that an m:m merge yields meaningless pairings. In you example, for bgid 2,

bid bgid  fracpop
21  2    .3
22  2    .2
23  2    .5

Assuming that you have, in the second file,
bgid dateyq bgpop
2    2010q1 whatever
2    2010q2 whatever
2    2010q3 whatever
2    2010q4 whatever

The first case (bid 21) would pair with 2010q1; the second (bid 22) with 2010q2; the third (bid 23) would be replicated and paired with 2010q3 and 2010q4.
I'm not sure that this is meaningful.

But now that I understand your expand-to-a-panel scheme, it does look correct. And it makes sense that it would be faster than -joinby-.

Best wishes,

At 11:13 AM 3/4/2011, you wrote:

I wrote m:m merge since each BG usually appears more than once in the
first file (since blocks are the ids) and more than once in the second
(since it's a block group panel). I checked a few cases with the real
data and it seems to have worked. I just wanted to make sure that
there was nothing that I was missing and hoping to find a special case
that does not produce garbage.

By expanding into a panel, I meant stack the file1 on top
of itself
four times (4 quarters of 2010) and create a dateyq variable. The data
would not change over time, but it seemed to make m:1 by date and bgid
easier (at least in my head).

The reason I wanted to try merge is that is appears to be much faster
than joinby, which has been running for a long time on a pretty fast


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index