Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extracting substrings from variables.


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Extracting substrings from variables.
Date   Fri, 25 May 2012 13:06:37 +0100

It's a motto of mine always to try something simpler before I try
regex machinery. (Nothing against regex, just a preference to use
something simpler when that suffices; and I've seen so many struggling
with regex that this preference is supported. It's also true that you
won't learn regex without trying it.)

In this case

... if inlist(substr(m1diagx, 1, 3), "637", "642")

captures the rule that "637" or "642" are allowed as the first three characters

while

... if substr(m1diagx, 1, 2) == "O1"

captures the other rule. And you can combine them:

... if inlist(substr(m1diagx, 1, 3), "637", "642") | substr(m1diagx,
1, 2) == "O1"

When I say simpler, I mean conceptually: the syntax can be a bit long
and messy.

Note that -inlist()- and -substr()-, like -regexs()-, are all
functions, not commands.

Nick .

On Fri, May 25, 2012 at 12:45 PM, Amal Khanolkar <[email protected]> wrote:

> Im trying to generate a new variable using 'stubs' or 'substrings' of a combination of letters and numbers to indicate what I would like to be included in the new variable.
>
> For example I have a variable 'Diagnoses' that contains all ICD 9 & 10 diagnoses for subjects included in the dataset. I would however only like to extract those subjects with diagnoses starting with the numbers '637' and '642' and 'O1' (the last being a combination of the letter O and number 1).
>
> Thus I will have a new variable with subjects with certain specific diagnoses starting with the numbers/letters indicated above. I should also add I would like to specify that '637' and '642' above are just the stating numbers and might include other letters/numbers following it which is ok.
>
> I've tried doing the above using the 'regexs' command:
>
> gen preght = regexs(0) if regexm(mdiag1x, "[^637] | [^642] | [^O1]")
>
> Could you give some tips on how this can be further improved or other easier commands I could use?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index