Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: range of a stringvariable

From   "Nick Cox" <>
To   <>
Subject   Re: st: RE: range of a stringvariable
Date   Wed, 28 Apr 2010 18:19:55 +0100

I don't see that, even with the conditional here. 

. di inrange("E30B", "E300", "E499")

And clearly the last character is not an "A". 

Of course, if you are telling me that ICD-9 codes are in some order that
is not completely consistent with Stata's, then (a) I didn't know that
and (b) the code may need adjustment (but not the principles). 


-----Original Message-----
From: Richard Goldstein [] 
Sent: 28 April 2010 17:53
Cc: Nick Cox
Subject: Re: st: RE: range of a stringvariable

depending on what the data actually look like, Nick's code will not give
the correct answer; e.g., "E30B" will meet his condition but not the
OP's condition


On 4/28/10 12:41 PM, Nick Cox wrote:
> Some simpler ways of approaching this have not quite come to the
> in this thread. 
> Four key points: 
> 1. You are not obliged to create lots of little variables. 
> 2. You are not obliged to convert any bits and pieces to real unless
> genuinely want those results for other purposes. 
> 3. Inequalities apply to strings as well as to numbers. The order
> concerned is just alphanumeric order, precisely that used by Stata to
> -sort- string variables. 
> 4. -substr()- understands negative indexes as counted from the end of
> string. 
> Thus 
> if inrange(substr(code, 1, 4), "E300", "E499") & substr(code, -1, 1)
> "A" 
> is a complete answer to the first question. Similarly 
> if substr(code,-1,1) == "A" 
> is a complete answer to the second question.
> It's the driest of dry reading but the functions section of the
> documentation is an eye-opener in terms of the toolkit offered. 
> Nick 
> Tomas Lind
> Choose individuals based on a string variable with a range of values
> I am working with ICD-10 codes (codes for different types of
> The
> codes start with a letter A - Z followed by 2 or 3 digits. In some
> they might end with the letter A. Say that I have a dataset with 5
> subjets
> (id=1 to 5) with these ICD-10 codes (fake data, in reality I have
> millions
> of subjects):
> I460  E343  I46  C764  E438
> How can I choose individuals with ICD-10 codes in the range E300 to
> (not including codes that end up with A). What about if I want to
> include
> codes that ends with an A. (There is a convenient command for ICD-9
> codes,
> but not for ICD-10 codes.) 

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index