| Title | Labeling ICD codes with their descriptions | |
| Author | Rebecca Pope, StataCorp |
While you cannot label ICD-9-CM or ICD-10 codes directly, you can still display information about their descriptions. There are two options:
Suppose you have data containing patient record IDs and ICD-9-CM diagnosis codes that look like
recid dx
150781 9110
150913 4241
151088 4254
151125 9033
151154 78650
151165 8028
151207 51881
151344 3051
151415 4321
151487 V140
Stata's icd9 generate, icd9p generate, and icd10 generate commands with the description option create a new variable with the description of the corresponding code.
. icd9 generate descr = dx, description
. list, clean noobs
recid dx descr
150781 9110 abrasion trunk
150913 4241 aortic valve disorder
151088 4254 prim cardiomyopathy nec
151125 9033 injury ulnar vessels
151154 78650 chest pain nos
151165 8028 fx facial bone nec-close
151207 51881 acute respiratry failure
151344 3051 tobacco use disorder
151415 4321 subdural hemorrhage
151487 V140 hx-penicillin allergy
. describe
Contains data from icd9exdata.dta
obs: 10
vars: 3 20 Oct 2015 18:02
size: 330 (_dta has notes)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
recid float %9.0g Patient record ID
dx str5 %9s Diagnosis
descr str24 %24s label for dx
-------------------------------------------------------------------------------
Sorted by: recid
Note: Dataset has changed since last saved.
With the descriptions added, the size of the dataset is 330 bytes. We may be able to reduce the size of the dataset using encode.
To add a label to a numeric value, first create a string variable with the diagnosis description, then use encode.
. icd9 generate descr = dx, description long . encode descr, generate(dxlabeled) label(descrip)
The new variable is long by default, but we can use compress to make sure it is stored in the smallest possible numeric type.
. compress variable dxlabeled was long now byte (30 bytes saved)
Finally, drop the created string variable because it is unnecessary.
. drop descr
While you could also remove the original, unencoded diagnosis variable, you should keep it if you plan to do data manipulation based on the codes or if you might need to combine your dataset with new data in the future. Our dataset now looks like this:
. list, clean noobs
recid dx dxlabeled
150781 9110 911.0 abrasion trunk
150913 4241 424.1 aortic valve disorder
151088 4254 425.4 prim cardiomyopathy nec
151125 9033 903.3 injury ulnar vessels
151154 78650 786.50 chest pain nos
151165 8028 802.8 fx facial bone nec-close
151207 51881 518.81 acute respiratry failure
151344 3051 305.1 tobacco use disorder
151415 4321 432.1 subdural hemorrhage
151487 V140 V14.0 hx-penicillin allergy
In general, using encode results in a smaller dataset than adding a variable that contains the descriptions.
. describe
Contains data from icd9exdata.dta
obs: 10
vars: 3 20 Oct 2015 18:02
size: 100 (_dta has notes)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
recid float %9.0g Patient record ID
dx str5 %9s Diagnosis
dxlabeled byte %32.0g descrip label for dx
-------------------------------------------------------------------------------
Sorted by: recid
Note: Dataset has changed since last saved.
The version of our dataset after using encode is only 100 bytes.