Censored Poisson regression

Order

Watch video demo

<- See Stata's other features

Highlights

Right-censoring
Left-censoring
Interval-censoring/interval data
Incidence-rate ratios
Predictions

Number of events
Number of events, conditional on censoring
Probability of a count or range of counts
Conditional probability of a count or range of counts

Poisson regression is used when the dependent variable is a count from a Poisson process.

Outcomes can be left-censored if they are not observed when they are below a certain level and can be right-censored if are not observed when they are above another level.

Command cpoisson fits Poisson regression models on count data and allows the counts to be left-censored, right-censored, or both. The censoring can be at constant values, or it can differ across observations.

An example of a right-censored count outcome is the number of cars in a family, where data might be top-coded at 3 or more.

An example of a left-censored count outcome is the number of cookie boxes sold by Girl Scouts if the first outcome value recorded is 10 or fewer boxes.

Left- and right-censoring combined is also known as interval-censoring.

Distinguish between censored and truncated. With censored outcomes, it is the outcomes that are not observed even though the observation is in our data; we observe the other values for the person. In truncated data, it is the observation that is entirely missing from our data. Stata has an estimator for truncated Poisson data, see [R] tpoisson.

Let's see it work

Below we study the number of car accidents a person has during a year. The number recorded is 0, 1, 2, or 3, and 3 means 3 or more accidents. The number is right-censored.

We will model the determinants of accidents as the number of previous accidents, whether the driver is a parent, and the number of traffic tickets the driver received during the previous year.

We type

. cpoisson accidents i.past i.parent i.ntickets, ul(3) irr

Initial:      Log likelihood =  -2657.162
Rescale:      Log likelihood =  -2657.162
Iteration 0:  Log likelihood =  -2657.162
Iteration 1:  Log likelihood = -2638.9113
Iteration 2:  Log likelihood = -2638.7142
Iteration 3:  Log likelihood = -2638.6901
Iteration 4:  Log likelihood = -2638.6863
Iteration 5:  Log likelihood = -2638.6859
Iteration 6:  Log likelihood = -2638.6858
Iteration 7:  Log likelihood = -2638.6858

Censored Poisson regression                        Number of obs     =   3,000
                                                          Uncensored =   2,840
Limits: Lower = 0                                      Left-censored =       0
        Upper = 3                                     Right-censored =     160

                                                   LR chi2(8)        = 1003.81
Log likelihood = -2638.6858                        Prob > chi2       =  0.0000



   accidents         IRR   Std. err.      z    P>|z|     [95% conf. interval]
 
      1.past    2.641695   .1967576    13.04   0.000     2.282884    3.056902
    1.parent    .8345776   .0425139    -3.55   0.000     .7552765     .922205
             
    ntickets 
          1     1.994213   .1182697    11.64   0.000     1.775374    2.240027
          2     3.841546   .2575073    20.08   0.000      3.36859    4.380906
          3     6.979123   .6090534    22.26   0.000     5.881909    8.281012
          4     15.97291   2.579757    17.16   0.000     11.63879    21.92099
          5      66.9069   2547.936     0.11   0.912     2.57e-31    1.74e+34
          6     58.24981   4426.516     0.05   0.957     1.20e-63    2.82e+66
             
       _cons    .3387513    .015292   -23.98   0.000     .3100673    .3700889

Note: _cons estimates baseline incidence rate.

We interpret the model coefficients (or incidence-rate ratios) as if the censoring had not occurred. That is to say, as though we had seen all of the data, uncensored.

We find that past accidents predict more future accidents, that being a parent predicts fewer future accidents, and that the number of tickets generally predicts more future accidents.

Because of the censoring, we do not know which of the people coded as having 3 accidents really had exactly 3 accidents, or which had more.

We can, however, now make predictions of the expected uncensored number of accidents and the probabilities of any specified number of accidents, including values greater than 3.

We wonder, what are the chances anyone had more than 3 accidents in our data? Our data were officially top-coded, but were they practically top-coded? We can obtain each driver's probability of having four or more accidents by typing

 . predict fourplus, pr(4,.)

We now have the probability that each driver in our sample had four or more accidents. To get the expected number of drivers who had 4 or more accidents, we simply sum these probabilities

. total fourplus

Total estimation                         Number of obs = 3,000




                    Total   Std. err.     [95% conf. interval]

    fourplus     74.46338   5.614501      63.45472    85.47205

We expect 74.5 drivers in our data had more than 3 accidents, and top-coding almost certainly affected our data.

Almost certainly? We have a standard error above, but the standard error and confidence interval do not account for the probabilities having themselves been estimated. If we use margins to perform the computation, it will produce the correct standard error and confidence interval

. margins, expression(predict(pr(4,.))*3000)


Predictive margins                                       Number of obs = 3,000
Model VCE: OIM

Expression: predict(pr(4,.))*3000



                          Delta-method
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

       _cons     74.46338   5.238347    14.22   0.000     64.19641    84.73036

margins wants to report a mean, so we had to trick it into giving us a total by multiplying the probabilities by our sample size of 3000.

With such a small standard error and a lower bound of 64.2 on our confidence interval, we can definitively say, or at least as definitively as any statistician can say, that top-coding affected our data.

Tell me more

Read more about censored Poisson models in the Stata Base Reference Manual; see [R] cpoisson.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


accidents		IRR Std. err. z P>\|z\| [95% conf. interval]

1.past		2.641695 .1967576 13.04 0.000 2.282884 3.056902
1.parent		.8345776 .0425139 -3.55 0.000 .7552765 .922205

ntickets
1		1.994213 .1182697 11.64 0.000 1.775374 2.240027
2		3.841546 .2575073 20.08 0.000 3.36859 4.380906
3		6.979123 .6090534 22.26 0.000 5.881909 8.281012
4		15.97291 2.579757 17.16 0.000 11.63879 21.92099
5		66.9069 2547.936 0.11 0.912 2.57e-31 1.74e+34
6		58.24981 4426.516 0.05 0.957 1.20e-63 2.82e+66

_cons		.3387513 .015292 -23.98 0.000 .3100673 .3700889


		Total Std. err. [95% conf. interval]

fourplus		74.46338 5.614501 63.45472 85.47205


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

_cons		74.46338 5.238347 14.22 0.000 64.19641 84.73036