Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Removing Repeated Phrases in String Variable

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Removing Repeated Phrases in String Variable
Date	Sat, 14 Dec 2013 22:24:01 +0000

In addition to Eric's helpful and detailed suggestions, check out

http://www.stata.com/support/faqs/data-management/counting-distinct-strings/index.html

Nick
[email protected]


On 14 December 2013 21:56, Eric Booth <[email protected]> wrote:
> <>
>
>
>
> Some examples:
>
>
> ******************!
>
> ****EXAMPLE 1:
> clear
> inp str1000 test
> "BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford"
> "item1; item number2; item3; item number2"
> end
>
> replace test = `"""'+test+`"""'
> replace test = subinstr(test, "; ", `"" ""', .) //tokenize
>
>
> **
> list test , notrim noobs
>
> forval n = 1/`=_N' {
>         loc t `"`=test[`n']'"'
>         loc t2 : list uniq t
>          replace test = `"`: list uniq t'"' in `n'
>         }
>
>
> list test , notrim noobs //duplicates gone
>
>
>
> ***************
> ****EXAMPLE 2:
>
> clear
> inp str1000 test
> "BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford"
> "item1; item number2; item3; item number2"
> end
> replace test = subinstr(test, "; ", `"" ""', .) //tokenize
>
>
> split test, parse(`"" ""')
> di `"`r(nvars)'"'
> drop test
>
> g i = _n
> reshape long test@, i(i) j(j)
> duplicates drop i test, force
>
>
> ****
> **put back together
> reshape wide test@, i(i) j(j)
> drop i
> g test = ""
> order test
> foreach x of varlist test* {
>          replace test = test+ `"""' + `x' + `"" "'
>         }
>         replace test = subinstr(test, `""" "', "", .)
> *****************!
>
> -lstrfun- and -moss- from SSC could be of use as well.
>
>
>
> - Eric
>
>
> On Dec 14, 2013, at 3:26 PM, Becker Stein <[email protected]> wrote:
>
>> Hi,
>>
>> I was wondering if someone could help me remove repeated words/phrases in a string variable. My data has a lot repeats and I only want to keep the first instance of an item. Below is an example.
>>
>> BMW North America; Honda; Toyota; Nissan; BMW North America; Mercedes Benz North America; Nissan; Subaru; Nissan; Ford
>>
>> In the above example, I'd like to get rid of the extra instances of BMW North America and Nissan. Is there a way to do this? Thanks in advance for your help.
>>
>> Becker
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Removing Repeated Phrases in String Variable
  - From: Becker Stein <[email protected]>
- Re: st: Removing Repeated Phrases in String Variable
  - From: Eric Booth <[email protected]>

Prev by Date: Re: st: fit index for ordered logistic regression
Next by Date: Re: st: any idea?
Previous by thread: Re: st: Removing Repeated Phrases in String Variable
Index(es):
- Date
- Thread