Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: RE: range of a stringvariable

 From "Nick Cox" To Subject Re: st: RE: range of a stringvariable Date Wed, 28 Apr 2010 18:19:55 +0100

```I don't see that, even with the conditional here.

. di inrange("E30B", "E300", "E499")
1

And clearly the last character is not an "A".

Of course, if you are telling me that ICD-9 codes are in some order that
is not completely consistent with Stata's, then (a) I didn't know that
and (b) the code may need adjustment (but not the principles).

Nick
n.j.cox@durham.ac.uk

-----Original Message-----
From: Richard Goldstein [mailto:richgold@ix.netcom.com]
Sent: 28 April 2010 17:53
To: statalist@hsphsun2.harvard.edu
Cc: Nick Cox
Subject: Re: st: RE: range of a stringvariable

depending on what the data actually look like, Nick's code will not give
the correct answer; e.g., "E30B" will meet his condition but not the
OP's condition

Rich

On 4/28/10 12:41 PM, Nick Cox wrote:
> Some simpler ways of approaching this have not quite come to the
surface
>
> Four key points:
>
> 1. You are not obliged to create lots of little variables.
>
> 2. You are not obliged to convert any bits and pieces to real unless
you
> genuinely want those results for other purposes.
>
> 3. Inequalities apply to strings as well as to numbers. The order
> concerned is just alphanumeric order, precisely that used by Stata to
> -sort- string variables.
>
> 4. -substr()- understands negative indexes as counted from the end of
a
> string.
>
> Thus
>
> if inrange(substr(code, 1, 4), "E300", "E499") & substr(code, -1, 1)
!=
> "A"
>
> is a complete answer to the first question. Similarly
>
> if substr(code,-1,1) == "A"
>
> is a complete answer to the second question.
>
> It's the driest of dry reading but the functions section of the
> documentation is an eye-opener in terms of the toolkit offered.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Tomas Lind
>
> Choose individuals based on a string variable with a range of values
>
> I am working with ICD-10 codes (codes for different types of
diseases).
> The
> codes start with a letter A - Z followed by 2 or 3 digits. In some
cases
> they might end with the letter A. Say that I have a dataset with 5
> subjets
> (id=1 to 5) with these ICD-10 codes (fake data, in reality I have
> millions
> of subjects):
>
> I460  E343  I46  C764  E438
>
> How can I choose individuals with ICD-10 codes in the range E300 to
E499
> (not including codes that end up with A). What about if I want to
> include
> codes that ends with an A. (There is a convenient command for ICD-9
> codes,
> but not for ICD-10 codes.)

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```