Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Improve matching |
Date | Fri, 17 May 2013 16:41:53 +0100 |
gen namelength = length(trim(firm)) bysort id (namelength) = trim(firm[_N]) Then if all is well by id : keep if _n == _N I threw in trimming of leading and trailing spaces too. (For "STATA" read "Stata".) Nick njcoxstata@gmail.com On 17 May 2013 16:20, Seliger Florian <seliger@kof.ethz.ch> wrote: > Dear Statalist, > > I want to improve matching of two datasets. I use a matching software, but want to prepare the datasets with STATA. > One dataset is in panel structure, i.e. I have multiple observations per ID: > > ID firm > 314 POLYTYPE > 314 POLYTYPE MASCHINENFABRIK > 314 POLYTYPE NA > 314 POLYTYPE > 314 POLYTYPE > 314 POLYTYPE > > The length of the firm name may vary across IDs. > I noticed that it is necessary to keep only the ID with the LONGEST NAME, in this case "POLYTYPE MASCHINENFABRIK". > It is necessary to delete all other IDs for the same observation to enable a proper matching (due to the software). > > Any suggestions how I could only keep the LONGEST NAME IDs (each observation has different names, and within each observation each ID has different names, there is no regularity) in STATA? > > Thank you for your consideration. > > Best, > Florian > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/