>> Home >> Resources & support >> FAQs >> Checking a variable for a range of ICD codes

## How do I check a variable for a range of diagnosis or procedure codes?

 Title Checking a variable for a range of ICD codes Author Rebecca Pope, StataCorp

You can check whether a given variable has ICD-9-CM diagnosis codes, ICD-9-CM procedure codes, or ICD-10 diagnosis codes by using, respectively, the icd9, icd9p, or icd10 command with the generate subcommand and range() option.

For example, if you were analyzing ICD-9-CM diagnosis codes, you might have data that look like

    recid     dx1     dx2     dx3
84   4414    99811   4275
105   25013   3572    25063
255   51909   1489    V146
651   9678    E8528
696   V271    64421   65641
779   5409    V1582   V1062
814   27651   V1087   V4364
826   9951    462     2724
833   42789   5409    27801
863   5770    29181   4255


where dx1 records the primary diagnosis and dx2 and dx3 record secondary diagnoses.

Suppose you want to determine which records have a primary diagnosis for diabetes, indicated by codes starting with 250. You only need to type

. icd9 generate diabetes = dx1, range(250*)

. list, clean noobs

recid     dx1     dx2     dx3   diabetes
84   4414    99811   4275           0
105   25013   3572    25063          1
255   51909   1489    V146           0
651   9678    E8528                  0
696   V271    64421   65641          0
779   5409    V1582   V1062          0
814   27651   V1087   V4364          0
826   9951    462     2724           0
833   42789   5409    27801          0
863   5770    29181   4255           0


You might want to check all diagnosis fields. For example, suppose your study protocol calls for excluding records for patients with a history of malignant cancer (codes starting V10) or who came to the hospital to give birth (codes starting V27). While there are different ways to handle multiple diagnosis codes, the fastest way, especially for large datasets, is to use a loop.

Here we loop through the three diagnosis variables, generate three indicators for whether the code corresponds to malignant cancer or giving birth, and name them excl_dx#.

. foreach dxnum of varlist dx1 dx2 dx3 {
2.     icd9 generate excl_dxnum' = dxnum', range(V10* V27*)
3. }

. list, clean noobs

recid     dx1     dx2     dx3   diabetes   excl_dx1   excl_dx2   excl_dx3
84   4414    99811   4275           0          0          0          0
105   25013   3572    25063          1          0          0          0
255   51909   1489    V146           0          0          0          0
651   9678    E8528                  0          0          0          .
696   V271    64421   65641          0          1          0          0
779   5409    V1582   V1062          0          0          0          1
814   27651   V1087   V4364          0          0          1          0
826   9951    462     2724           0          0          0          0
833   42789   5409    27801          0          0          0          0
863   5770    29181   4255           0          0          0          0


You can then take the sum across the excl_dx# for the patient record to get a single exclusion indicator.

Dropping all of the new excl_dx# variables is not strictly necessary, but they are not needed and it saves some space.

. egen exclude = rowtotal(excl_dx*)

. drop excl_dx*

. list, clean noobs

recid     dx1     dx2     dx3   diabetes   exclude
84   4414    99811   4275           0         0
105   25013   3572    25063          1         0
255   51909   1489    V146           0         0
651   9678    E8528                  0         0
696   V271    64421   65641          0         1
779   5409    V1582   V1062          0         1
814   27651   V1087   V4364          0         1
826   9951    462     2724           0         0
833   42789   5409    27801          0         0
863   5770    29181   4255           0         0


The same principles apply to ICD-9-CM procedure codes and to ICD-10 diagnosis codes, so choose the command that is appropriate for the codes that you have.