»  Home »  Products »  Features »  Censored Poisson regression
Order Stata

Censored Poisson regression


Highlights

  • Right-censoring
  • Left-censoring
  • Interval-censoring/interval data
  • Incidence-rate ratios
  • Predictions
    • Number of events
    • Number of events, conditional on censoring
    • Probability of a count or range of counts
    • Conditional probability of a count or range of counts

What's this about?

Poisson regression is used when the dependent variable is a count from a Poisson process.

Outcomes can be left-censored if they are not observed when they are below a certain level and can be right-censored if are not observed when they are above another level.

Command cpoisson fits Poisson regression models on count data and allows the counts to be left-censored, right-censored, or both. The censoring can be at constant values, or it can differ across observations.

An example of a right-censored count outcome is the number of cars in a family, where data might be top-coded at 3 or more.

An example of a left-censored count outcome is the number of cookie boxes sold by Girl Scouts if the first outcome value recorded is 10 or fewer boxes.

Left- and right-censoring combined is also known as interval-censoring.

Distinguish between censored and truncated. With censored outcomes, it is the outcomes that are not observed even though the observation is in our data; we observe the other values for the person. In truncated data, it is the observation that is entirely missing from our data. Stata has an estimator for truncated Poisson data, see [R] tpoisson.

Let's see it work

Below we study the number of car accidents a person has during a year. The number recorded is 0, 1, 2, or 3, and 3 means 3 or more accidents. The number is right-censored.

We will model the determinants of accidents as the number of previous accidents, whether the driver is a parent, and the number of traffic tickets the driver received during the previous year.

We type

. cpoisson accidents i.past i.parent i.ntickets, ul(3) irr

initial:       log likelihood = -2526.9286
rescale:       log likelihood = -2526.9286
Iteration 0:   log likelihood = -2526.9286  
Iteration 1:   log likelihood = -2517.5881  
Iteration 2:   log likelihood = -2517.4177  
Iteration 3:   log likelihood = -2517.3981  
Iteration 4:   log likelihood =  -2517.395  
Iteration 5:   log likelihood = -2517.3945  
Iteration 6:   log likelihood = -2517.3944  
Iteration 7:   log likelihood = -2517.3944  

Censored Poisson regression                     Number of obs     =      3,000
                                                   Uncensored     =      2,879
Limits: lower = 0                                  Left-censored  =          0
        upper = 3                                  Right-censored =        121

                                                LR chi2(8)        =     868.30
Log likelihood = -2517.3944                     Prob > chi2       =     0.0000

accidents IRR Std. Err. z P>|z| [95% Conf. Interval]
1.past 2.447084 .1923433 11.39 0.000 2.097701 2.854658
1.parent .8578361 .0458958 -2.87 0.004 .7724377 .9526758
ntickets
1 1.885661 .1194031 10.02 0.000 1.665575 2.134829
2 3.702418 .263019 18.43 0.000 3.221189 4.255539
3 7.158695 .6330925 22.26 0.000 6.019442 8.513565
4 11.14634 1.584564 16.96 0.000 8.435779 14.72784
5 73.7821 3161.995 0.10 0.920 2.45e-35 2.22e+38
6 65.85229 5629.409 0.05 0.961 1.13e-71 3.84e+74
_cons .3041697 .014478 -25.00 0.000 .2770768 .3339118
Note: _cons estimates baseline incidence rate.

We interpret the model coefficients (or incidence-rate ratios) as if the censoring had not occurred. That is to say, as though we had seen all of the data, uncensored.

We find that past accidents predict more future accidents, that being a parent predicts fewer future accidents, and that the number of tickets generally predicts more future accidents, although having just 1 or 2 tickets has little significance.

Because of the censoring, we do not know which of the people coded as having 3 accidents really had exactly 3 accidents, or which had more.

We can, however, now make predictions of the expected uncensored number of accidents and the probabilities of any specified number of accidents, including values greater than 3.

We wonder, what are the chances anyone had more than 3 accidents in our data? Our data were officially top-coded, but were they practically top-coded? We can obtain each driver's probability of having four or more accidents by typing

 . predict fourplus, pr(4,.)

We now have the probability that each driver in our sample had four or more accidents. To get the expected number of drivers who had 4 or more accidents, we simply sum these probabilities

. total fourplus

Total estimation                  Number of obs   =      3,000

Total Std. Err. [95% Conf. Interval]
fourplus 52.46773 4.232577 44.16868 60.76677

We expect 52.5 drivers in our data had more than 3 accidents, and top-coding almost certainly affected our data.

Almost certainly? We have a standard error above, but the standard error and confidence interval do not account for the probabilities having themselves been estimated. If we use margins to perform the computation, it will produce the correct standard error and confidence interval

. margins , expression(predict(pr(4,.))*3000)

Predictive margins                              Number of obs     =      3,000
Model VCE    : OIM

Expression   : predict(pr(4,.))*3000

Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]
_cons 52.46773 4.530656 11.58 0.000 43.5878 61.34765

margins wants to report a mean, so we had to trick it into giving us a total by multiplying the probabilities by our sample size of 3000.

With such a small standard error and a lower bound of 43.6 on our confidence interval, we can definitively say, or at least as definitively as any statistician can say, that top-coding affected our data.

Tell me more

Read more about censored Poisson models in Stata Base Reference Manual; see [R] cpoisson.

Stata

Shop

Support

Company


The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ YouTube
© Copyright 1996–2017 StataCorp LLC   •   Terms of use   •   Privacy   •   Contact us