Home  /  Resources & support  /  FAQs  /  Labeling ICD codes with their descriptions

How do I label my diagnosis or procedure codes with their descriptions?

Title   Labeling ICD codes with their descriptions
Author Rebecca Pope, StataCorp

While you cannot label ICD-9-CM or ICD-10 codes directly, you can still display information about their descriptions. There are two options:

  1. Store the descriptions in a new string variable.
  2. Create a corresponding numeric variable and label its values.

Suppose you have data containing patient record IDs and ICD-9-CM diagnosis codes that look like

     recid      dx  
    150781   9110   
    150913   4241   
    151088   4254   
    151125   9033   
    151154   78650  
    151165   8028   
    151207   51881  
    151344   3051   
    151415   4321   
    151487   V140   

Option 1: Store the descriptions in a new string variable

Stata's icd9 generate, icd9p generate, and icd10 generate commands with the description option create a new variable with the description of the corresponding code.

. icd9 generate descr = dx, description

. list, clean noobs

     recid      dx                      descr  
    150781   9110              abrasion trunk  
    150913   4241       aortic valve disorder  
    151088   4254     prim cardiomyopathy nec  
    151125   9033        injury ulnar vessels  
    151154   78650             chest pain nos  
    151165   8028    fx facial bone nec-close  
    151207   51881   acute respiratry failure  
    151344   3051        tobacco use disorder  
    151415   4321         subdural hemorrhage  
    151487   V140       hx-penicillin allergy  

. describe

Contains data from icd9exdata.dta
  obs:            10                          
 vars:             3                          20 Oct 2015 18:02
 size:           330                          (_dta has notes)
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
recid           float   %9.0g                 Patient record ID
dx              str5    %9s                   Diagnosis
descr           str24   %24s                  label for dx
-------------------------------------------------------------------------------
Sorted by: recid
     Note: Dataset has changed since last saved.

With the descriptions added, the size of the dataset is 330 bytes. We may be able to reduce the size of the dataset using encode.

Option 2: Create a corresponding numeric variable and label its values

To add a label to a numeric value, first create a string variable with the diagnosis description, then use encode.

. icd9 generate descr = dx, description long

. encode descr, generate(dxlabeled) label(descrip)

The new variable is long by default, but we can use compress to make sure it is stored in the smallest possible numeric type.

. compress 
  variable dxlabeled was long now byte
  (30 bytes saved)

Finally, drop the created string variable because it is unnecessary.

. drop descr

While you could also remove the original, unencoded diagnosis variable, you should keep it if you plan to do data manipulation based on the codes or if you might need to combine your dataset with new data in the future. Our dataset now looks like this:

. list, clean noobs

     recid      dx                         dxlabeled  
    150781   9110              911.0  abrasion trunk  
    150913   4241       424.1  aortic valve disorder  
    151088   4254     425.4  prim cardiomyopathy nec  
    151125   9033        903.3  injury ulnar vessels  
    151154   78650             786.50 chest pain nos  
    151165   8028    802.8  fx facial bone nec-close  
    151207   51881   518.81 acute respiratry failure  
    151344   3051        305.1  tobacco use disorder  
    151415   4321         432.1  subdural hemorrhage  
    151487   V140       V14.0  hx-penicillin allergy  

In general, using encode results in a smaller dataset than adding a variable that contains the descriptions.

. describe

Contains data from icd9exdata.dta
  obs:            10                          
 vars:             3                          20 Oct 2015 18:02
 size:           100                          (_dta has notes)
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
recid           float   %9.0g                 Patient record ID
dx              str5    %9s                   Diagnosis
dxlabeled       byte    %32.0g     descrip    label for dx
-------------------------------------------------------------------------------
Sorted by: recid
     Note: Dataset has changed since last saved.

The version of our dataset after using encode is only 100 bytes.