Stata | FAQ: Stata 5: Goodness-of-fit chi-squared test reported by poisson

Home / Resources & support / FAQs / Goodness-of-fit chi-squared test with poisson

Note: This FAQ is for users of Stata 5. It is not relevant for more recent versions.

This question was originally posed on Statalist.

Stata 5: Why does the goodness-of-fit chi-squared test reported by poisson change when the counts and exposures are grouped differently?

Title		Stata 5: Goodness-of-fit chi-squared test reported by poisson
Author		Bill Sribney, StataCorp

Question:

The version 5 documentation indicates the goodness-of-fit chi-squared statistic reported with the results of Poisson regression is a test of the null hypothesis that the dependent variable is Poisson distributed. My question is why this statistic (and perhaps the resulting inference regarding the appropriateness of Poisson regression) varies with the composition of the right-hand-side variables.

Answer:

The goodness-of-fit chi-squared statistic in the poisson command is a simple Pearson's chi-squared statistic:

     N
    Sum  (observed - expected)² /expected
    i=1

where i indexes the observations in the dataset. The df is

    df = N - (#terms in model including the constant)

If you split up or group the counts and exposures differently, you get different cells for the Pearson's chi-squared and thus a different statistic.

Here’s an example using the first example in the poisson entry of the manual on page 31 of the P–Z Reference manual:

 . list
    
        airline   injuries         n   XYZowned  
   1.         1         11    0.0950          1  
   2.         2          7    0.1920          0  
   3.         3          7    0.0750          0  
   4.         4         19    0.2078          0  
   5.         5          9    0.1382          0  
   6.         6          4    0.0540          1  
   7.         7          3    0.1292          0  
   8.         8          1    0.0503          0  
   9.         9          3    0.0629          1  
 
 . poisson injuries XYZowned, exposure(n) irr
 
 Iteration 0: Log Likelihood = -23.90184
 Iteration 1: Log Likelihood = -23.032242
 Iteration 2: Log Likelihood = -23.027176
 
 Poisson regression, normalized by n                 Number of obs    =       9
 Goodness-of-fit chi2(7)     =    14.094             Model chi2(1)    =   1.768
 Prob > chi2                 =    0.0495             Prob > chi2      =  0.1836
 Log Likelihood              =   -23.027             Pseudo R2        =  0.0370
 
 ------------------------------------------------------------------------------
 injuries |        IRR   Std. Err.       z     P>|z|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
 XYZowned |   1.463467    .406872      1.370   0.171       .8486578    2.523675
 ------------------------------------------------------------------------------

Now we will group the data by the unique covariate patterns of the model. In this case that simply amounts to grouping by XYZowned and summing counts (injuries) and exposure (n) within this grouping:

 . collapse (sum) injuries n, by(XYZowned)
    
 . list
    
       XYZowned   injuries           n  
   1.         0         46       .7925  
   2.         1         18       .2119  
 
 . poisson injuries XYZowned, exposure(n) irr
         
 Iteration 0: Log Likelihood = -5.2133484
 Iteration 1: Log Likelihood = -5.2038269
 
 Poisson regression, normalized by n                 Number of obs    =       2
 Goodness-of-fit chi2(0)     =     0.000             Model chi2(1)    =   1.768
 Prob > chi2                 =         .             Prob > chi2      =  0.1836
 Log Likelihood              =    -5.204             Pseudo R2        =  0.1452
 
 ------------------------------------------------------------------------------
 injuries |        IRR   Std. Err.       z     P>|z|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
 XYZowned |   1.463466   .4068718      1.370   0.171       .8486574    2.523673
 ------------------------------------------------------------------------------

Note that the IRR and std error are the same, but the goodness-of-fit test is different. From the standpoint of the Poisson regression, both the original and collapsed datasets are equivalent, but the first dataset has more information about the Poisson-ness of the data since you can examine the counts for small portions of exposure.

When the portions of exposure get too small, one gets the well-known problem of the expected counts for the Pearson chi-squared becoming small.

Perhaps Stata should automatically group by covariate pattern before doing the Pearson's chi-squared as lfit does after logistic. But in some cases, it is certainly legitimate NOT to group (this one is close to being one of these cases — injuries are just a little too low for some obs).

Note that Pearson’s chi-squared also has a problem when its df become large. This happens for poisson when the number of observation becomes large.

My personal rules of thumb:

If the number of unique covariate patterns is not small (say greater than 20), then group on it for the gof test so that your dataset has only one observation per unique covariate pattern.
Look at predicted (expected) counts. If there are any very small ones (< 2) or lots of small ones (< 5), view Pearson's chi-squared gof test with suspicion.
If the df of the chi-squared is large (>50-100), take the result with a large grain of salt. (This is true for any chi-squared statistic.)

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

2024 Stata Conference · 1-2 August · Portland, OR

View the program →

View the program →

Stata 5: Why does the goodness-of-fit chi-squared test reported by poisson change when the counts and exposures are grouped differently?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

2024 Stata Conference · 1-2 August · Portland, OR View the program →

View the program →

Stata 5: Why does the goodness-of-fit chi-squared test reported by poisson change when the counts and exposures are grouped differently?

Question:

Answer:

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

2024 Stata Conference · 1-2 August · Portland, OR

View the program →