»  Home »  Products »  Stata 15 »  Analyze data with ICD-10-CM/PCS codes

# Analyze data with ICD-10-CM/PCS codes

## Highlights

• Works with
• NCHS ICD-10-CM diagnosis codes (healthcare encounter and claims data)
• CMS ICD-10-PCS procedure codes (healthcare claims data)
• Data-management commands let you
• Generate new variables based on codes
• Indicators for different conditions
• Short descriptions
• Category codes from billable codes
• And more
• Verify that variables contain valid codes and flag invalid codes
• Standardize format of codes
• Interactive utilities let you
• Look up descriptions for codes
• Search for codes from keywords
• Specify the version of codes that your dataset contains

The U.S. now uses ICD-10-CM and ICD-10-PCS to encode diagnoses and procedures in administrative healthcare data, such as claims for medical services.

The new icd10cm and icd10pcs commands support these systems, just as Stata has supported previous ICD releases.

These new commands make your research and reporting life easier. When administrative data are gathered from multiple sources, the format of the codes may not be standardized, or there may be reporting errors. Furthermore, the sheer number of codes in these systems means that analyzing the data in a meaningful way can be difficult.

icd10cm and icd10pcs let you easily verify that codes are valid and add variables such as code descriptions and indicators for whether patients have a particular diagnosis or procedure. They also let you interactively look up descriptions of codes.

## Let's see it work

Let's imagine that we want to compare costs for the different types of Cesarean sections and delivery. We are conducting our study using hospital discharge records and have data from the last six months of 2016. New ICD-10-CM/PCS codes are released every October, so our data are a mix of 2016 and 2017 codes. The new commands make processing such data easy.

First, let's check that the codes are valid. We will specify the version(2016) option for codes recorded before October 1. We type

. use discharges16
(Discharges, 2016 Q3-Q4)

. icd10cm check diag1 if dmonth <= tm(2016m9), version(2016)
(diag1 contains defined codes; no missing values)


Now, we can check discharges between October 2016 and December 2016 by specifying version(2017).

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017)
(diag1 contains no missing values)

diag1 contains undefined codes:

1.  Invalid placement of period                       0
2.  Too many periods                                  0
3.  Code too short                                    0
4.  Code too long                                     0
5.  Invalid 1st char (not A-Z)                        0
6.  Invalid 2nd char (not 0-9)                        0
7.  Invalid 3rd char (not 0-9 A or B)                 0
8.  Invalid 4th char (not 0-9 or A-Z)                 0
9.  Invalid 5th char (not 0-9 or A-Z)                 0
10.  Invalid 6th char (not 0-9 or A-Z)                 0
11.  Invalid 7th char (not 0-9 or A-Z)                 0
77.  Valid only for previous versions                  3
88.  Valid only for later versions                     0
99.  Code not defined                                  0
___________
Total                                             3



We have three problems in the last quarter of 2016.

We'll probably want to know what codes are causing problems. To do this, we can add the summary option, which will show the frequency of each problematic code and the reason it is causing us trouble.

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) summary
(diag1 contains no missing values)

(output omitted)

Summary of invalid and undefined codes

diag1   Count   Problem

T8351XA       3   Valid only for previous versions



It appears that the hospital used the obsolete code T83.51XA. Usually, we would want to fix problems like this. We could try to use the original medical record. With just administrative data at hand, we also could start by finding out what the code T83.51XA is for.

. icd10cm lookup T8351XA, version(2016)

T83.51XA Infect/inflm reaction due to indwell urinary catheter, init


We would then search for a suitable alternative using icd10cm search with keywords from the description of T83.51XA and substitute the alternative it found by typing

. replace diag1 = .... if dmonth >=tm(2016m10) & diag1=="T8351XA"


Before we did that, we might want to check the dates on which three problems occurred by typing

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) generate(probtype)
(output omitted)

. tabulate probtype dmonth

result of check for           Discharge month
diag1     2016m10    2016m11    2016m12       Total

Defined code         333        323        336         992
Valid only for previo           2          1          0           3

Total         335        324        336         995



Instead of bothering with any of this, we will ignore this problem because T83.51XA has nothing to do with pregnancy.

Having satisfied ourselves that there are no errors in our data that will affect our study, we are ready to begin in earnest. Let's first create a variable that marks the portion of the data in which we are interested.

Deliveries and Cesarean sections can be identified by one of four ICD-10-PCS codes: 10D.00Z0, 10D.00Z1, 10D.00Z2, and 10E.0XZZ. We want to flag all records that have one of these codes in proc1, the primary procedure code, as eligible for our study.

. icd10pcs generate insample = proc1, range(10D.00* 10E.0XZZ)


We were able to abbreviate the codes starting with “10D.00” because only these three codes fall in this group.

Now that we have insample, we can add the modifier if insample==1 to the end of Stata commands to restrict ourselves to the relevant data.

We can then create a variable that has the code and description for just the study-eligible records.

. icd10pcs generate delivery = proc1 if insample==1, description addcode(begin)


Let's look at the four codes of interest:

. tabulate delivery

description of proc1        Freq.     Percent        Cum.

10D.00Z0 Extraction of POC, Classical..            6        2.17        2.17
10D.00Z1 Extraction of POC, Low Cervi..           93       33.70       35.87
10E.0XZZ Delivery of Products of Conc..          177       64.13      100.00

Total          276      100.00



The first thing we discover is that only three of the four codes appear in the data. That does not bother us; the fourth is an uncommonly used code.

Before we can fit our regression of cost (variable billed) on the codes and length of stay (variable los) we must create a new numeric variable, which we will name dtype, containing the values 1, 2, and 3 for the three codes. We type

. encode proc1 if insample==1, generate(dtype)


and then we fit our regression:

. regress billed i.dtype los

Source         SS           df       MS      Number of obs   =       276
F(3, 272)       =    340.64
Model    80247.8212         3  26749.2737   Prob > F        =    0.0000
Residual    21359.4251       272  78.5272982   R-squared       =    0.7898
Total    101607.246       275  369.480896   Root MSE        =    8.8616

billed        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

dtype
10D00Z1     -5.635262   4.717685    -1.19   0.233    -14.92308    3.652558
10E0XZZ       -15.019    4.80023    -3.13   0.002    -24.46933   -5.568676

los     3.519866   .1659896    21.21   0.000     3.193078    3.846654
_cons     22.25525   4.966573     4.48   0.000     12.47744    32.03306



We want the average cost for each of the codes after controlling for length of stay. Stata's margins will give us the average cost.

. margins dtype

Predictive margins                              Number of obs     =        276
Model VCE    : OLS

Expression   : Linear prediction, predict()

Delta-method
Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]

dtype     31.85836   4.667969     6.82   0.000     22.66842     41.0483
10D00Z1       26.2231    .921179    28.47   0.000     24.40955    28.03665
10E0XZZ      16.83936   .6794237    24.78   0.000     15.50176    18.17696



The averages are $31,858,$26,223, and $16,839. For a visual comparison, we can create a bar chart: . marginsplot, recast(bar) title("Predictive Margins and 95% Confidence Intervals") subtitle("Billed Amount in$1,000s")
ytitle("Predictive Margins of Billed Amount") ylabel(0(10)40)
xtitle("Delivery Procedure Code")

Variables that uniquely identify margins: dtype


## Tell me more

You can read more about ICD coding, including tips for working with records with multiple diagnosis codes, in the Introduction to ICD commands.

Also see worked examples for the individual coding systems: