»  Home »  Products »  Stata 15 »  Analyze data with ICD-10-CM/PCS codes

Analyze data with ICD-10-CM/PCS codes


  • Works with
    • NCHS ICD-10-CM diagnosis codes (healthcare encounter and claims data)
    • CMS ICD-10-PCS procedure codes (healthcare claims data)
  • Data-management commands let you
    • Generate new variables based on codes
      • Indicators for different conditions
      • Short descriptions
      • Category codes from billable codes
      • And more
    • Verify that variables contain valid codes and flag invalid codes
    • Standardize format of codes
  • Interactive utilities let you
    • Look up descriptions for codes
    • Search for codes from keywords
  • Specify the version of codes that your dataset contains

What's this about?

The U.S. now uses ICD-10-CM and ICD-10-PCS to encode diagnoses and procedures in administrative healthcare data, such as claims for medical services.

The new icd10cm and icd10pcs commands support these systems, just as Stata has supported previous ICD releases.

These new commands make your research and reporting life easier. When administrative data are gathered from multiple sources, the format of the codes may not be standardized, or there may be reporting errors. Furthermore, the sheer number of codes in these systems means that analyzing the data in a meaningful way can be difficult.

icd10cm and icd10pcs let you easily verify that codes are valid and add variables such as code descriptions and indicators for whether patients have a particular diagnosis or procedure. They also let you interactively look up descriptions of codes.

Let's see it work

Let's imagine that we want to compare costs for the different types of Cesarean sections and delivery. We are conducting our study using hospital discharge records and have data from the last six months of 2016. New ICD-10-CM/PCS codes are released every October, so our data are a mix of 2016 and 2017 codes. The new commands make processing such data easy.

First, let's check that the codes are valid. We will specify the version(2016) option for codes recorded before October 1. We type

. use discharges16
(Discharges, 2016 Q3-Q4)

. icd10cm check diag1 if dmonth <= tm(2016m9), version(2016)
(diag1 contains defined codes; no missing values)

Now, we can check discharges between October 2016 and December 2016 by specifying version(2017).

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017)
(diag1 contains no missing values)

diag1 contains undefined codes:

    1.  Invalid placement of period                       0
    2.  Too many periods                                  0
    3.  Code too short                                    0
    4.  Code too long                                     0
    5.  Invalid 1st char (not A-Z)                        0
    6.  Invalid 2nd char (not 0-9)                        0
    7.  Invalid 3rd char (not 0-9 A or B)                 0
    8.  Invalid 4th char (not 0-9 or A-Z)                 0
    9.  Invalid 5th char (not 0-9 or A-Z)                 0
   10.  Invalid 6th char (not 0-9 or A-Z)                 0
   11.  Invalid 7th char (not 0-9 or A-Z)                 0
   77.  Valid only for previous versions                  3
   88.  Valid only for later versions                     0
   99.  Code not defined                                  0
        Total                                             3

We have three problems in the last quarter of 2016.

We'll probably want to know what codes are causing problems. To do this, we can add the summary option, which will show the frequency of each problematic code and the reason it is causing us trouble.

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) summary
(diag1 contains no missing values)

(output omitted)

Summary of invalid and undefined codes

diag1 Count Problem
T8351XA 3 Valid only for previous versions

It appears that the hospital used the obsolete code T83.51XA. Usually, we would want to fix problems like this. We could try to use the original medical record. With just administrative data at hand, we also could start by finding out what the code T83.51XA is for.

. icd10cm lookup T8351XA, version(2016)

    T83.51XA Infect/inflm reaction due to indwell urinary catheter, init

We would then search for a suitable alternative using icd10cm search with keywords from the description of T83.51XA and substitute the alternative it found by typing

. replace diag1 = .... if dmonth >=tm(2016m10) & diag1=="T8351XA"

Before we did that, we might want to check the dates on which three problems occurred by typing

. icd10cm check diag1 if dmonth >= tm(2016m10), version(2017) generate(probtype)
(output omitted)

. tabulate probtype dmonth

result of check for Discharge month
diag1 2016m10 2016m11 2016m12 Total
Defined code 333 323 336 992
Valid only for previo 2 1 0 3
Total 335 324 336 995

Instead of bothering with any of this, we will ignore this problem because T83.51XA has nothing to do with pregnancy.

Having satisfied ourselves that there are no errors in our data that will affect our study, we are ready to begin in earnest. Let's first create a variable that marks the portion of the data in which we are interested.

Deliveries and Cesarean sections can be identified by one of four ICD-10-PCS codes: 10D.00Z0, 10D.00Z1, 10D.00Z2, and 10E.0XZZ. We want to flag all records that have one of these codes in proc1, the primary procedure code, as eligible for our study.

. icd10pcs generate insample = proc1, range(10D.00* 10E.0XZZ)

We were able to abbreviate the codes starting with “10D.00” because only these three codes fall in this group.

Now that we have insample, we can add the modifier if insample==1 to the end of Stata commands to restrict ourselves to the relevant data.

We can then create a variable that has the code and description for just the study-eligible records.

. icd10pcs generate delivery = proc1 if insample==1, description addcode(begin)

Let's look at the four codes of interest:

. tabulate delivery

description of proc1 Freq. Percent Cum.
10D.00Z0 Extraction of POC, Classical.. 6 2.17 2.17
10D.00Z1 Extraction of POC, Low Cervi.. 93 33.70 35.87
10E.0XZZ Delivery of Products of Conc.. 177 64.13 100.00
Total 276 100.00

The first thing we discover is that only three of the four codes appear in the data. That does not bother us; the fourth is an uncommonly used code.

Before we can fit our regression of cost (variable billed) on the codes and length of stay (variable los) we must create a new numeric variable, which we will name dtype, containing the values 1, 2, and 3 for the three codes. We type

. encode proc1 if insample==1, generate(dtype)

and then we fit our regression:

. regress billed i.dtype los

Source SS df MS Number of obs = 276
F(3, 272) = 340.64
Model 80247.8212 3 26749.2737 Prob > F = 0.0000
Residual 21359.4251 272 78.5272982 R-squared = 0.7898
Adj R-squared = 0.7875
Total 101607.246 275 369.480896 Root MSE = 8.8616
billed Coef. Std. Err. t P>|t| [95% Conf. Interval]
10D00Z1 -5.635262 4.717685 -1.19 0.233 -14.92308 3.652558
10E0XZZ -15.019 4.80023 -3.13 0.002 -24.46933 -5.568676
los 3.519866 .1659896 21.21 0.000 3.193078 3.846654
_cons 22.25525 4.966573 4.48 0.000 12.47744 32.03306

We want the average cost for each of the codes after controlling for length of stay. Stata's margins will give us the average cost.

. margins dtype

Predictive margins                              Number of obs     =        276
Model VCE    : OLS

Expression   : Linear prediction, predict()

Margin Std. Err. t P>|t| [95% Conf. Interval]
dtype 31.85836 4.667969 6.82 0.000 22.66842 41.0483
10D00Z1 26.2231 .921179 28.47 0.000 24.40955 28.03665
10E0XZZ 16.83936 .6794237 24.78 0.000 15.50176 18.17696

The averages are $31,858, $26,223, and $16,839. For a visual comparison, we can create a bar chart:

. marginsplot, recast(bar)
      title("Predictive Margins and 95% Confidence Intervals") 
      subtitle("Billed Amount in $1,000s") 
      ytitle("Predictive Margins of Billed Amount") ylabel(0(10)40) 
      xtitle("Delivery Procedure Code") 

  Variables that uniquely identify margins: dtype


Tell me more

You can read more about ICD coding, including tips for working with records with multiple diagnosis codes, in the Introduction to ICD commands.

Also see worked examples for the individual coding systems:

The ICD-10-CM coding system is a licensed adaptation of the World Health Organization's ICD-10. Copyright information for ICD-10 can be found in the ICD-10 copyright notification.


International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM)





The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ YouTube
© Copyright 1996–2017 StataCorp LLC   •   Terms of use   •   Privacy   •   Contact us