[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Grouping records by a STRING datatype

From   Joseph McDonnell <>
Subject   Re: st: Grouping records by a STRING datatype
Date   Wed, 1 Jul 2009 10:56:24 +0930

Hi Sam

as I see it you have 4 sets of requirements

1) DX1 has to be one of a subset of the 481, 482, 483, 485, 486 categories
2) age>=2 months
3) PR1 has to begin with "860"
4) PR2, PR3 etc have to be blank. You don't mention how many of these
you have but presumably it's a relatively small number.

You CAN do this in one step (undoubtedly more efficient), but I'd
advocate doing it in several steps for the sake of readability. So
here's my suggestion...

* initially include no patients
. gen IsIn=0

* generate a variable which marks those with the correct DX1
. replace IsIn=1 if inlist(substr(DX1,1,3),"481","483","485","486") |

* succesively unmark those within this group which don't fit the other criteria
. replace IsIn=0 if agemonths<2
. replace IsIn=0 if substr(PR1,1,3)!="860"
. replace IsIn=0 if trim(PR2)!="" | trim(PR3)!="" | trim(PR4)!=""

At the end of this, those who are marked are the ones you wish to
have. Hopefully.

I've used the trim function because sometimes a space gets entered
into a text field and they are difficult to spot. As I said, you can
combine these but it becomes pretty unreadable. If there are more PRs,
you might want to investigate loops. Worth doing in any case if you
find you're doing repetitive programming.

Hope this helps.



On Wed, Jul 1, 2009 at 9:21 AM, Sam Lu<> wrote:
> Hi All,
> New user to STATA here.
> I just started learning STATA this past week, though I do have some
> experience with R project and MySQL.  My research advisor has asked
> that I use STATA so here I am today.
> My research is medical in nature, so pardon if I use some jargo that's
> not familar to everyone.  I'm attempting to group various diagnoses by
> their ICD-9 code.  For example, the general category of "asthma" (or
> another disease) has a 3-digit ICD-9 code of 493.  A more specific
> diagnosis of asthma builds on the 3-digit code.  Thus, "extrinsic
> asthma" would be 493.0 while "extrinsic asthma with status
> asthmaticus" would be 493.01.  ICD-9 codes stop at the fifth digit or
> what math-types would normally call the hundredths place.
> I have not converted the ICD-9 codes from a string datatype to a
> numerical one is because there are some ICD-9 cdoes that start with a
> zero (0), and I fear that converting them to a numerical value may not
> faithfully preserve the true code.  So far, I can group major ICD-9
> category if there is only one ICD-9 code.  For example, when I bin
> asthma I use the following code (note that "DX1" is the principal
> diagnosis):
> generate ACS = "Asthma" if regexm(DX1, "493+")
> The above code bins ICD9 codes that have 493 as their first three
> digits, and appears to work fine.
> However, there are other predefined illness categories that have
> multiple ICD9 codes plus other constraints (e.g., age, procedure
> performed) that complicates matters. For example, "bacterial
> pneumonia" encompasses a DX1 of 481.XX or 482.1X or 482.3X or 482.9X
> or 483.XX or 485.XX or 486.XX.  In addition, only patients with an age
>>= 2 months are included, and the secondary diagnosis cannot be 282.6X
> ("X" can be any number); also, the primary procedure (PR1) performed
> must equal 860.XX while there cannot be any other procedures performed
> (i.e., the fields for PR2, PR3, etc. must be blank).
> So, how do a code that monster of a query in STATA?
> Thanks for any help,
> Sam
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index