Nick Cox <n.j.cox@durham.ac.uk>

statalist@hsphsun2.harvard.edu

st: RE: Extracting different portions of string values

Fri, 1 Oct 2010 11:53:10 +0100

I suspect you might need a combination of -strpos()- and substr()-, but I don't understand your criteria well enough to suggest exact code. How does one discriminate a "citation number" from anything else? That may be a matter of a regular expression. Alternatively, check out -split-. Nick n.j.cox@durham.ac.uk Florian Seliger we are searching for commands in order to extract different portions of string values. Our data with patent citations looks like this: id cit_1 1 EP696218-A -- WO9215370-A SUND _SUND-Individual_ 2 WO9425112-A -- GB298635-A 3 EP578126-A -- CH180906-A AGE_OK 4 EP562128-A -- DE1684639-A 5 WO9318277-A -- DK137935-B 6 US4434855-A SEC OF NAVY _USNA_ . . . . with 100,000 IDs and about 500 affected variables (cit_1, cit_2, cit_3...). In this example, we only want to keep the second portion for the IDs 1-5, but the first portion for ID 6. We want to extract the first portion whenever there is only one citation number. The data should thus look like this: id cit_1 1 WO9215370-A 2 GB298635-A 3 CH180906-A 4 DE1684639-A 5 DK137935-B 6 US4434855-A . . . * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

