Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: range of a stringvariable


From   Richard Goldstein <richgold@ix.netcom.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: range of a stringvariable
Date   Wed, 28 Apr 2010 12:52:33 -0400

depending on what the data actually look like, Nick's code will not give
the correct answer; e.g., "E30B" will meet his condition but not the
OP's condition

Rich

On 4/28/10 12:41 PM, Nick Cox wrote:
> Some simpler ways of approaching this have not quite come to the surface
> in this thread. 
> 
> Four key points: 
> 
> 1. You are not obliged to create lots of little variables. 
> 
> 2. You are not obliged to convert any bits and pieces to real unless you
> genuinely want those results for other purposes. 
> 
> 3. Inequalities apply to strings as well as to numbers. The order
> concerned is just alphanumeric order, precisely that used by Stata to
> -sort- string variables. 
> 
> 4. -substr()- understands negative indexes as counted from the end of a
> string. 
> 
> Thus 
> 
> if inrange(substr(code, 1, 4), "E300", "E499") & substr(code, -1, 1) !=
> "A" 
> 
> is a complete answer to the first question. Similarly 
> 
> if substr(code,-1,1) == "A" 
> 
> is a complete answer to the second question.
> 
> It's the driest of dry reading but the functions section of the
> documentation is an eye-opener in terms of the toolkit offered. 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Tomas Lind
> 
> Choose individuals based on a string variable with a range of values
> 
> I am working with ICD-10 codes (codes for different types of diseases).
> The
> codes start with a letter A - Z followed by 2 or 3 digits. In some cases
> they might end with the letter A. Say that I have a dataset with 5
> subjets
> (id=1 to 5) with these ICD-10 codes (fake data, in reality I have
> millions
> of subjects):
> 
> I460  E343  I46  C764  E438
> 
> How can I choose individuals with ICD-10 codes in the range E300 to E499
> (not including codes that end up with A). What about if I want to
> include
> codes that ends with an A. (There is a convenient command for ICD-9
> codes,
> but not for ICD-10 codes.) 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index