Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nina <yinnina.ccer@gmail.com> |

To |
statalist <statalist@hsphsun2.harvard.edu> |

Subject |
Re: Re: st: how to identify strings among which some are abbreviated andgroup strings which have the same keywords |

Date |
Wed, 9 Nov 2011 16:21:15 +0100 |

Great! Thank you, Dimitriy! Best, Nina From: Dimitriy V. Masterov Date: 2011-11-09 16:18 To: statalist Subject: Re: st: how to identify strings among which some are abbreviated andgroup strings which have the same keywords Nina, Take a look at user-written command from ssc called strgroup. Depending on your operating system, it will really help with something like this. A non-Stata alternative is to use Google Refine, which is works really well for something like this. Still, none of these solutions will work fully, and you will still have to do a fair bit of manual work. DVM On Wed, Nov 9, 2011 at 10:02 AM, Nina <yinnina.ccer@gmail.com> wrote: > Dear all, > > I have two questions to ask for your help. > The first one: > There is a string variable which defines applicant of patents in my dataset. I want to identify applicants uniquely, and I use -encode applicant, gen(firm)- to generate a numeric variable to identify them. However, for the same applicant, some of them are in full name and others are abbreviated. For example, > > application number applicant > 1 Mcneil consumer > 2 Mcneil cons > > when I use encode, two different identifiers are generated for the same applicant "mcneil consumer". Do you have any suggestions to deal with this case? > > The second one: > The dataset is similar as the above one. And in this case, I want to generate a group id which assign one id for the applicants which is the subsidiaries of a company. For example, as shown in the following data, I want to generate a id which is equal to 1 for application 1&2 because the applicants are from "Mcneil"; while the id is equal to 2 for application 3&4 because they are from Mylan group. > application number applicant > 1 MCNEIL PEDIATRICS > 2 MCNEIL CONSUMER HEALTHCARE DIV MCNEIL PPC INC > 3 MYLAN LABORATORIES INC > 4 MYLAN PHARMACEUTICALS INC > > Any suggestions and comments are more than welcome! > Thank you very much! > > Best, > Nina > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: how to identify strings among which some are abbreviated and group strings which have the same keywords***From:*Nina <yinnina.ccer@gmail.com>

**Re: st: how to identify strings among which some are abbreviated and group strings which have the same keywords***From:*"Dimitriy V. Masterov" <dvmaster@gmail.com>

- Prev by Date:
**RE: st: xtreg fixed effect** - Next by Date:
**st: Error in Stata? xtdpdsys assigns explanatory power to fixed effects** - Previous by thread:
**Re: st: how to identify strings among which some are abbreviated and group strings which have the same keywords** - Next by thread:
**st: Error in Stata? xtdpdsys assigns explanatory power to fixed effects** - Index(es):