Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merging only the common elements, and dropping duplicates


From   "Michael N. Mitchell" <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Merging only the common elements, and dropping duplicates
Date   Wed, 16 Jun 2010 00:28:46 -0700

I responded too soon!

It looks like you do not have a "1 to 1" merge on your hands, it looks more like a "many to many" merge, where many person records can match to many account records.

A "many to many" merge can be accomplished with the -joinby- command, but I don't think that will give you the results that you seek. I think that we might benefit by hearing more about the "many" to "many" nature of this merge, and whether one of these datasets should have duplicates removed prior to merging, to thus create a 1:m or m:1 merge.

I hope that helps,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-06-16 12.22 AM, Michael N. Mitchell wrote:
Dear George

Using Stata 11.0 and above, you can do this...

. use database1, clear
. merge 1:1 account using database2, keep(match)

and the resulting merged file will contain only matches.

I hope this helps,

Michael N. Mitchell
Data Management Using Stata - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week - http://www.MichaelNormanMitchell.com



On 2010-06-16 12.13 AM, George Chioran wrote:
Dear statalisters,

I am trying to merge 2
databases to match persons with accounts; however I discovered that
when I do
this, the final database contains a lot of duplicates.
Database 1 looks like
this. I sort it by account.

Person Account
2 311
2 311
2 311
2 311
2 413
2 413
2 413

Database 2 looks like
this. I sort it by account.

Account B_S
311 b
311 b
311 b

To merge them, I load
database 1 and merge it using database 1. But when I do this, I would
like to
have a final database that contains only the common observation of
both. But I get database 3
that looks like this.

Person Account B_S
2 311 b
2 311 b
2 311 b
2 311 b
2 413 .
2 413 .
2 413 .

Is there a way to get
just the common observations? I would like my final database to have
just:
Person Account B_S

2 311 b
2 311 b
2 311 b

Thank you,
George



*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index