Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: tricky data merge/joinby problem
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: tricky data merge/joinby problem
Date
Fri, 04 Mar 2011 11:28:04 -0500
Dimitry,
I still think that an m:m merge yields meaningless pairings. In you
example, for bgid 2,
bid bgid fracpop
21 2 .3
22 2 .2
23 2 .5
Assuming that you have, in the second file,
bgid dateyq bgpop
2 2010q1 whatever
2 2010q2 whatever
2 2010q3 whatever
2 2010q4 whatever
The first case (bid 21) would pair with 2010q1; the second (bid 22)
with 2010q2; the third (bid 23) would be replicated and paired with
2010q3 and 2010q4.
I'm not sure that this is meaningful.
But now that I understand your expand-to-a-panel scheme, it does look
correct. And it makes sense that it would be faster than -joinby-.
Best wishes,
--David
At 11:13 AM 3/4/2011, you wrote:
David,
I wrote m:m merge since each BG usually appears more than once in the
first file (since blocks are the ids) and more than once in the second
(since it's a block group panel). I checked a few cases with the real
data and it seems to have worked. I just wanted to make sure that
there was nothing that I was missing and hoping to find a special case
that does not produce garbage.
By expanding into a panel, I meant stack the file1 on top
of itself
four times (4 quarters of 2010) and create a dateyq variable. The data
would not change over time, but it seemed to make m:1 by date and bgid
easier (at least in my head).
The reason I wanted to try merge is that is appears to be much faster
than joinby, which has been running for a long time on a pretty fast
server.
DVM
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/