Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: matching observations for merging


From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: matching observations for merging
Date   Thu, 17 Jun 2010 15:56:29 +0000 (GMT)

--- On Thu, 17/6/10, Abhimanyu Arora wrote:
> I have to files to be merged. Is it possible to merge using
> an approximation of the merging variable? In other words, if
> my merging variable is say, country, there could be a slight change in
> spelling of some countries (Afghanistan/ Afganistan) in the two
> files...Is there a more efficient way than just going through all 200+
> countries and checking spelling consistency?

For countries the quickest way is to 
1) keep in each dataset one observation per country
2) merge the 2 datasets
3) keep if _merge != 3 
4) sort on country name
5) list

This will display a list of troublesome country names, which is
usually so short that it doesn't pay to do anything more fancy.

With this list you can create a recode .do file which harmonizes
country names before the final merge. 

Moreover, this harmonization do file can be a good starting position 
in any subsequent project involving the merge on country names, as the
kind of inconsistencies in country names are pretty similar across 
files. So at the begining of each project you start by running the 
harmonization do-file of the last project, than go through steps 1-5 
to find any mismatches that weren't handeld in the last do-file, and 
add those to your new harmonization file. After 4 or 5 projects you 
will hardly find any mismatch anymore.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index