Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Grouping records by a STRING datatype


From   Sam Lu <alamoboy@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Grouping records by a STRING datatype
Date   Tue, 30 Jun 2009 18:51:55 -0500

Hi All,

New user to STATA here.

I just started learning STATA this past week, though I do have some
experience with R project and MySQL.  My research advisor has asked
that I use STATA so here I am today.

My research is medical in nature, so pardon if I use some jargo that's
not familar to everyone.  I'm attempting to group various diagnoses by
their ICD-9 code.  For example, the general category of "asthma" (or
another disease) has a 3-digit ICD-9 code of 493.  A more specific
diagnosis of asthma builds on the 3-digit code.  Thus, "extrinsic
asthma" would be 493.0 while "extrinsic asthma with status
asthmaticus" would be 493.01.  ICD-9 codes stop at the fifth digit or
what math-types would normally call the hundredths place.

I have not converted the ICD-9 codes from a string datatype to a
numerical one is because there are some ICD-9 cdoes that start with a
zero (0), and I fear that converting them to a numerical value may not
faithfully preserve the true code.  So far, I can group major ICD-9
category if there is only one ICD-9 code.  For example, when I bin
asthma I use the following code (note that "DX1" is the principal
diagnosis):

generate ACS = "Asthma" if regexm(DX1, "493+")

The above code bins ICD9 codes that have 493 as their first three
digits, and appears to work fine.

However, there are other predefined illness categories that have
multiple ICD9 codes plus other constraints (e.g., age, procedure
performed) that complicates matters. For example, "bacterial
pneumonia" encompasses a DX1 of 481.XX or 482.1X or 482.3X or 482.9X
or 483.XX or 485.XX or 486.XX.  In addition, only patients with an age
>= 2 months are included, and the secondary diagnosis cannot be 282.6X
("X" can be any number); also, the primary procedure (PR1) performed
must equal 860.XX while there cannot be any other procedures performed
(i.e., the fields for PR2, PR3, etc. must be blank).

So, how do a code that monster of a query in STATA?


Thanks for any help,

Sam
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index