Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords
Date   Wed, 9 Nov 2011 15:14:43 +0000

It is difficult to give really good advice here. But for both your questions, you could -encode- a different variable that was just the first word of the variable, which you can extract using the -word(,)- function. 

See also

SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
        Q3/08   SJ 8(3):444--445                                 (no commands)
        tip on how to clean up user-entered string variables

Nick 
[email protected] 

Nina

The first one:
There is a string variable which defines applicant of patents in my dataset.  I want to identify applicants uniquely, and I use -encode applicant, gen(firm)- to generate a numeric variable to identify them. However, for the same applicant, some of them are in full name and others are abbreviated. For example, 

application number     applicant
1                                   Mcneil consumer
2                                   Mcneil cons

when I use encode, two different identifiers are generated for the same applicant "mcneil consumer". Do you have any suggestions to deal with this case? 

The second one:
The dataset is similar as the above one. And in this case, I want to generate a group id which assign one id for the applicants which is the subsidiaries of a company. For example, as shown in the following data, I want to generate a id which is equal to 1 for application 1&2 because the applicants are from "Mcneil"; while the id is equal to 2 for application 3&4 because they are from Mylan group. 
application number                      applicant
1                                              MCNEIL PEDIATRICS
2                                              MCNEIL CONSUMER HEALTHCARE DIV MCNEIL PPC INC
3                                              MYLAN LABORATORIES INC
4                                              MYLAN PHARMACEUTICALS INC


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index