Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Extracting different portions of string values


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Extracting different portions of string values
Date   Fri, 1 Oct 2010 11:53:10 +0100

I suspect you might need a combination of -strpos()- and substr()-, but I don't understand your criteria well enough to suggest exact code. How does one discriminate a "citation number" from anything else? That may be a matter of a regular expression. 

Alternatively, check out -split-. 

Nick 
n.j.cox@durham.ac.uk 

Florian Seliger

we are searching for commands in order to extract different portions of string  values.

Our data with patent citations looks like this:

id  cit_1
1   EP696218-A -- WO9215370-A   SUND _SUND-Individual_
2   WO9425112-A -- GB298635-A
3   EP578126-A -- CH180906-A    AGE_OK
4   EP562128-A -- DE1684639-A
5   WO9318277-A -- DK137935-B
6   US4434855-A   SEC OF NAVY _USNA_
.
.
.
.

with 100,000 IDs and about 500 affected variables (cit_1, cit_2, cit_3...).
In this example, we only want to keep the second portion for the IDs 1-5, but the first portion for ID 6. We want to extract the first portion whenever there is only one citation number.

The data should thus look like this:

id  cit_1
1   WO9215370-A
2   GB298635-A
3   CH180906-A
4   DE1684639-A
5   DK137935-B
6   US4434855-A
.
.
.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index