Zero-inflated ordered probit

Highlights

Ordinal outcome
Zero inflation: zero observations generated by two distinct processes
Robust, cluster–robust, and bootstrap standard errors
Complex survey designs support
Vuong test to compare ordered probit versus zero-inflated probit
Predict marginal, joint, and conditional probabilities of levels
Predict probability of participation and nonparticipation
Support for Bayesian estimation

What's this about?

Stata's new zioprobit command fits zero-inflated ordered probit (ZIOP) models.

ZIOP models are used for ordered response variables, such as (1) fully ambulatory, (2) ambulatory with restrictions, and (3) partially ambulatory, when the data exhibit a high fraction of observations at the lowest end of the ordering. It's called zero-inflated because the idea started with Poisson regression, and it was the lower-end zeros that were overly prevalent. Given the category values we just used, Stata's new zioprobit command could fit 1-inflated models. Or we could have numbered the categories 0, 1, and 2, and fit a 0-inflated model. The results would be the same either way.

Standard ordered probit models cannot account for the preponderance of zero observations when the zeros relate to an extra, distinct source. Consider a study of tobacco use in which the outcome of interest, smoking, is an ordered discrete response with four levels coded as 0, 1, 2, and 3, with 0 meaning "Nonsmoker" and 3 meaning "Daily, 20+ cigarettes/day".

Many of the individuals in the first category will be nonsmokers who have never smoked and will never smoke. The rest of them will be ex-smokers. Think of the standard ordered probit model as fitting the behavior of smokers, including ex-smokers. The zero inflation arises because the first group now includes those who have never smoked.

Let's see it work

We have fictional data on the smoking study just described. The outcome variable is called tobacco and contains



Category   Frequency   Meaning

       0       78.1%    Nonsmoker
       1        3.6%    Weekly or less
       2       13.0%    Daily, less than 20 cigarettes/day
       3        5.3%    Daily, 20 or more cigarettes/day

We believe that the 0 is inflated.

We want to fit a model in which smoking by those who have ever smoked is given by

income
gender
age

And membership in the never-smoked group is determined by

income
gender
age
whether parents smoked
religion

To fit the model, we type

. zioprobit tobacco income i.female age,
  inflate(income i.female age i.parent i.religion) vuong 

Zero-inflated ordered probit regression         Number of obs     =     14,899
                                                Wald chi2(3)      =     751.43
Log likelihood = -10299.787                     Prob > chi2       =     0.0000



     tobacco        Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

tobacco       
      income     .1503256   .0057582    26.11   0.000     .1390398    .1616113
              
tobacco       
      female  
     female     -.2726466    .047975    -5.68   0.000    -.3666759   -.1786173
         age    -.1394573    .011523   -12.10   0.000    -.1620419   -.1168727

inflate       
      income    -.0654874   .0087703    -7.47   0.000     -.082677   -.0482979
              
      female  
     female     -.2166707   .0509783    -4.25   0.000    -.3165863   -.1167552
         age     .1205886   .0165181     7.30   0.000     .0882136    .1529636
              
      parent  
    smoking      .7219495   .0436831    16.53   0.000     .6363321    .8075669
              
    religion  
discourages     -.2095319   .0586036    -3.58   0.000    -.3243927    -.094671
       _cons    -.5335904   .0873953    -6.11   0.000    -.7048821   -.3622987

       /cut1     .0683114   .0881964                     -.1045504    .2411731
       /cut2     .2977055   .0804097                      .1401054    .4553055
       /cut3     1.402649    .067253                      1.270836    1.534463

Vuong test of zioprobit vs. oprobit: z = 15.15                 Pr > z = 0.0000

The standard ordered probit parameters, coefficients and cutpoints, are displayed in the first and last parts of the output, respectively.

The middle part of the output reports the probit coefficients for the inflation.

We specified the vuong option to obtain the Vuong test at the end of the output. The null hypothesis is that the inflation part of the model is unnecessary. We can reject that at any reasonable significance level.

Coefficients can be difficult to interpret. For instance, what does a parent smoking coefficient of 0.72 mean? It means that, on average in the data, those whose parents are smokers are about 27% less likely to be never-smokers than those whose parents did not use tobacco. We obtained the 27% by using Stata's margins command:

. margins, predict(pnpar) dydx(parent)

Average marginal effects                        Number of obs     =     14,899
Model VCE    : OIM

Expression   : Pr(nonparticipation), predict(pnpar)
dy/dx w.r.t. : 1.parent



                          Delta-method
                    dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]

      parent  
    smoking      -.266089    .015175   -17.53   0.000    -.2958314   -.2363467

Note: dy/dx for factor levels is the discrete change from the base level.

The predict(pnpar) option is unique to margins when used after zioprobit. We asked margins to calculate predictions of the probability of nonparticipation, which in this example means the probability of being a never-smoker.

Tell me more

You can also fit Bayesian zero-inflated ordered probit models using the bayes prefix.

Read more about zero-inflated ordered probit in the Stata Base Reference Manual.

ORDER STATA UPGRADE NOW

Back to the highlights

This page announced the new features in Stata 15. Please see our Stata 19 page for the new features in Stata 19.

Zero-inflated ordered probit

Highlights

What's this about?

Let's see it work

Tell me more

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


tobacco		Coef. Std. Err. z P>\|z\| [95% Conf. Interval]

tobacco
income		.1503256 .0057582 26.11 0.000 .1390398 .1616113

tobacco
female
female		-.2726466 .047975 -5.68 0.000 -.3666759 -.1786173
age		-.1394573 .011523 -12.10 0.000 -.1620419 -.1168727

inflate
income		-.0654874 .0087703 -7.47 0.000 -.082677 -.0482979

female
female		-.2166707 .0509783 -4.25 0.000 -.3165863 -.1167552
age		.1205886 .0165181 7.30 0.000 .0882136 .1529636

parent
smoking		.7219495 .0436831 16.53 0.000 .6363321 .8075669

religion
discourages		-.2095319 .0586036 -3.58 0.000 -.3243927 -.094671
_cons		-.5335904 .0873953 -6.11 0.000 -.7048821 -.3622987

/cut1		.0683114 .0881964 -.1045504 .2411731
/cut2		.2977055 .0804097 .1401054 .4553055
/cut3		1.402649 .067253 1.270836 1.534463

Vuong test of zioprobit vs. oprobit: z = 15.15 Pr > z = 0.0000


		Delta-method
		dy/dx Std. Err. z P>\|z\| [95% Conf. Interval]

parent
smoking		-.266089 .015175 -17.53 0.000 -.2958314 -.2363467

Note: dy/dx for factor levels is the discrete change from the base level.

Stata/MP4 Annual License (download)

This page announced the new features in Stata 15. Please see our Stata 19 page for the new features in Stata 19.

Zero-inflated ordered probit

Highlights

What's this about?

Let's see it work

Tell me more

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies