Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: how to identify strings among which some are abbreviated andgroup strings which have the same keywords

From	Nina <[email protected]>
To	statalist <[email protected]>
Subject	Re: st: RE: how to identify strings among which some are abbreviated andgroup strings which have the same keywords
Date	Wed, 9 Nov 2011 16:20:28 +0100

Hi, Nick

Thanks for your advices and I will try your suggestion.

Best,
Nina





From: Nick Cox
Date: 2011-11-09 16:14
To: '[email protected]'
Subject: st: RE: how to identify strings among which some are abbreviated andgroup strings which have the same keywords
It is difficult to give really good advice here. But for both your questions, you could -encode- a different variable that was just the first word of the variable, which you can extract using the -word(,)- function. 

See also

SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
        Q3/08   SJ 8(3):444--445                                 (no commands)
        tip on how to clean up user-entered string variables

Nick 
[email protected] 

Nina

The first one:
There is a string variable which defines applicant of patents in my dataset.  I want to identify applicants uniquely, and I use -encode applicant, gen(firm)- to generate a numeric variable to identify them. However, for the same applicant, some of them are in full name and others are abbreviated. For example, 

application number     applicant
1                                   Mcneil consumer
2                                   Mcneil cons

when I use encode, two different identifiers are generated for the same applicant "mcneil consumer". Do you have any suggestions to deal with this case? 

The second one:
The dataset is similar as the above one. And in this case, I want to generate a group id which assign one id for the applicants which is the subsidiaries of a company. For example, as shown in the following data, I want to generate a id which is equal to 1 for application 1&2 because the applicants are from "Mcneil"; while the id is equal to 2 for application 3&4 because they are from Mylan group. 
application number                      applicant
1                                              MCNEIL PEDIATRICS
2                                              MCNEIL CONSUMER HEALTHCARE DIV MCNEIL PPC INC
3                                              MYLAN LABORATORIES INC
4                                              MYLAN PHARMACEUTICALS INC


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: how to identify strings among which some are abbreviated and group strings which have the same keywords
  - From: Nina <[email protected]>
- st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: how to identify strings among which some are abbreviated and group strings which have the same keywords
Next by Date: RE: st: xtreg fixed effect
Previous by thread: st: RE: how to identify strings among which some are abbreviated and group strings which have the same keywords
Next by thread: Re: st: how to identify strings among which some are abbreviated and group strings which have the same keywords
Index(es):
- Date
- Thread