Home  /  Resources & support  /  FAQs  /  Checking a variable for a range of ICD codes

How do I check a variable for a range of diagnosis or procedure codes?

Title   Checking a variable for a range of ICD codes
Author Rebecca Pope, StataCorp

You can check whether a given variable has ICD-9-CM diagnosis codes, ICD-9-CM procedure codes, or ICD-10 diagnosis codes by using, respectively, the icd9, icd9p, or icd10 command with the generate subcommand and range() option.

For example, if you were analyzing ICD-9-CM diagnosis codes, you might have data that look like

    recid     dx1     dx2     dx3  
       84   4414    99811   4275   
      105   25013   3572    25063  
      255   51909   1489    V146   
      651   9678    E8528          
      696   V271    64421   65641  
      779   5409    V1582   V1062  
      814   27651   V1087   V4364  
      826   9951    462     2724   
      833   42789   5409    27801  
      863   5770    29181   4255   

where dx1 records the primary diagnosis and dx2 and dx3 record secondary diagnoses.

Suppose you want to determine which records have a primary diagnosis for diabetes, indicated by codes starting with 250. You only need to type

. icd9 generate diabetes = dx1, range(250*)

. list, clean noobs

    recid     dx1     dx2     dx3   diabetes  
       84   4414    99811   4275           0  
      105   25013   3572    25063          1  
      255   51909   1489    V146           0  
      651   9678    E8528                  0  
      696   V271    64421   65641          0  
      779   5409    V1582   V1062          0  
      814   27651   V1087   V4364          0  
      826   9951    462     2724           0  
      833   42789   5409    27801          0  
      863   5770    29181   4255           0  

You might want to check all diagnosis fields. For example, suppose your study protocol calls for excluding records for patients with a history of malignant cancer (codes starting V10) or who came to the hospital to give birth (codes starting V27). While there are different ways to handle multiple diagnosis codes, the fastest way, especially for large datasets, is to use a loop.

Here we loop through the three diagnosis variables, generate three indicators for whether the code corresponds to malignant cancer or giving birth, and name them excl_dx#.

. foreach dxnum of varlist dx1 dx2 dx3 {
  2.     icd9 generate excl_`dxnum' = `dxnum', range(V10* V27*)
  3. }

. list, clean noobs

    recid     dx1     dx2     dx3   diabetes   excl_dx1   excl_dx2   excl_dx3  
       84   4414    99811   4275           0          0          0          0  
      105   25013   3572    25063          1          0          0          0  
      255   51909   1489    V146           0          0          0          0  
      651   9678    E8528                  0          0          0          .  
      696   V271    64421   65641          0          1          0          0  
      779   5409    V1582   V1062          0          0          0          1  
      814   27651   V1087   V4364          0          0          1          0  
      826   9951    462     2724           0          0          0          0  
      833   42789   5409    27801          0          0          0          0  
      863   5770    29181   4255           0          0          0          0  

You can then take the sum across the excl_dx# for the patient record to get a single exclusion indicator.

Dropping all of the new excl_dx# variables is not strictly necessary, but they are not needed and it saves some space.

. egen exclude = rowtotal(excl_dx*)

. drop excl_dx*

. list, clean noobs

    recid     dx1     dx2     dx3   diabetes   exclude  
       84   4414    99811   4275           0         0  
      105   25013   3572    25063          1         0  
      255   51909   1489    V146           0         0  
      651   9678    E8528                  0         0  
      696   V271    64421   65641          0         1  
      779   5409    V1582   V1062          0         1  
      814   27651   V1087   V4364          0         1  
      826   9951    462     2724           0         0  
      833   42789   5409    27801          0         0  
      863   5770    29181   4255           0         0  

The same principles apply to ICD-9-CM procedure codes and to ICD-10 diagnosis codes, so choose the command that is appropriate for the codes that you have.