Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Removing Repeated Phrases in String Variable


From   Eric Booth <eric.a.booth@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Removing Repeated Phrases in String Variable
Date   Sat, 14 Dec 2013 15:56:49 -0600

<>



Some examples:


******************!

****EXAMPLE 1:
clear
inp str1000 test
"BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford"
"item1; item number2; item3; item number2"
end

replace test = `"""'+test+`"""'
replace test = subinstr(test, "; ", `"" ""', .) //tokenize


**
list test , notrim noobs

forval n = 1/`=_N' {
	loc t `"`=test[`n']'"'
	loc t2 : list uniq t
	 replace test = `"`: list uniq t'"' in `n'
	}
	
	
list test , notrim noobs //duplicates gone



***************
****EXAMPLE 2:

clear
inp str1000 test
"BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford"
"item1; item number2; item3; item number2"
end
replace test = subinstr(test, "; ", `"" ""', .) //tokenize


split test, parse(`"" ""')
di `"`r(nvars)'"'
drop test

g i = _n
reshape long test@, i(i) j(j)
duplicates drop i test, force


****
**put back together
reshape wide test@, i(i) j(j)
drop i
g test = ""
order test
foreach x of varlist test* {
	 replace test = test+ `"""' + `x' + `"" "'
	}
	replace test = subinstr(test, `""" "', "", .)
*****************!

-lstrfun- and -moss- from SSC could be of use as well.



- Eric


On Dec 14, 2013, at 3:26 PM, Becker Stein <becker.stein@aol.com> wrote:

> Hi,
> 
> I was wondering if someone could help me remove repeated words/phrases in a string variable. My data has a lot repeats and I only want to keep the first instance of an item. Below is an example.
> 
> BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford
> 
> In the above example, I'd like to get rid of the extra instances of BMW North America and Nissan. Is there a way to do this? Thanks in advance for your help.
> 
> Becker
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index