Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Extracting parts of string variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Extracting parts of string variable
Date   Thu, 8 Apr 2010 17:56:21 +0100

I'd take a look at -split-. The recipe doesn't look simple even then given that your company names may contain blanks. 

Nick 
n.j.cox@durham.ac.uk 

Pavlos C. Symeou

I am experiencing some problems with a command I use to extract a part 
of a string variable which I use to create another string variable. The 
existing string variable is cit_1 and may contain (one or multiple 
instances of any of) a patent number (e.g. "US6449348-B1"), a company 
name (e.g. "3COM CORP"), a company abbreviation enclosed by "_" (e.g. 
"_THRE-Non-standard_"), other text after the "_" (e.g. see id 8). My aim 
is to extract the company name, which appears always before its 
abbreviation and use it to create a new string variable company_1. I 
used the following command, which however fails to account for different 
forms of the cit_1 values and produces incorrect company names.

gen company_1 = regexs(2) if (regexm(cit_1, "([A-Z0-9]*[\-][A-Z0-9]*[ 
\-]*) *([A-Z0-9 ]*)( *)([\_])(.*)([\_])"))

I provide below the various forms that cit_1 takes and how company_1 
should look.


id 	cit_1 	company_1
1 	US6449348-B1 3COM CORP _THRE-Non-standard_ 	3COM CORP
2 	US2004257999-A1 CETACEA NETWORKS CORP _CETA-Non-standard_ 	CETACEA 
NETWORKS CORP
3 	US5566180-A HEWLETT-PACKARD CO _HEWP_ 	HEWLETT-PACKARD CO
4 	US6215865-B1 E-TALK CORP _ETAL-Non-standard_ 	E-TALK CORP

	US4528422-A -- US452232-A1 INTELEPLEX CORP _INTE-Non-standard_ 
INTELEPLEX CORP
6 	US5600312-A MOTOROLA INC _MOTI_ 	MOTOROLA INC
7 	CONRED ELECTRONICS LTD _CONR-Non-standard_ MURAKOSHI S 	CONRED 
ELECTRONICS LTD
8 	TEMIC TELEFUNKEN MICROELECTRONIC GMBH _TELE_ LEICHT G, SCHUCH B 
TEMIC TELEFUNKEN MICROELECTRONIC GMBH
9 	US3476883-A 	
10 	US5136671-A AT & T BELL LAB _AMTT_ 	AT & T BELL LAB
11 	US5195132-A AMERICAN TELEPHONE & TELEGRAPH CO _AMTT_ 	AMERICAN 
TELEPHONE & TELEGRAPH CO
12 	US5605491-A CHURCH & DWIGHT CO INC _CHUR-Non-standard_ 	CHURCH & 
DWIGHT CO INC
13 	US6028656-A CAMBRIDGE RES & INSTR INC _CAMB-Non-standard_ 	CAMBRIDGE 
RES & INSTR INC
14 	US6201832 DAEWOO ELECTRONICS CO LTD _DAEW-Non-standard_ CHOI B 
DAEWOO ELECTRONICS CO LTD
15 	US6238946 INT BUSINESS MACHINES CORP _IBMC_ ZIEGLER J F 	INT 
BUSINESS MACHINES CORP
16 	US6947529-B2 -- US761995 	

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index