- Ordinal outcome
- Zero inflation: zero observations generated by two distinct processes
- Robust, cluster–robust, and bootstrap standard errors
- Complex survey designs support
- Vuong test to compare ordered probit versus zero-inflated probit
- Predict marginal, joint, and conditional probabilities of levels
- Predict probability of participation and nonparticipation
- Support for Bayesian estimation

Stata's new **zioprobit** command fits zero-inflated ordered
probit (ZIOP) models.

ZIOP models are used for ordered response variables, such as (1)
fully ambulatory, (2) ambulatory with restrictions, and (3)
partially ambulatory, when the data exhibit a high fraction of
observations at the lowest end of the ordering. It's called
zero-inflated because the idea started with Poisson regression,
and it was the lower-end zeros that were overly prevalent.
Given the category values we just used, Stata's new
**zioprobit** command could fit 1-inflated
models. Or we could have numbered the categories 0, 1, and 2,
and fit a 0-inflated model. The results would be the same
either way.

Standard ordered probit models cannot account for the preponderance of zero observations when the zeros relate to an extra, distinct source. Consider a study of tobacco use in which the outcome of interest, smoking, is an ordered discrete response with four levels coded as 0, 1, 2, and 3, with 0 meaning "Nonsmoker" and 3 meaning "Daily, 20+ cigarettes/day".

Many of the individuals in the first category will be nonsmokers who have never smoked and will never smoke. The rest of them will be ex-smokers. Think of the standard ordered probit model as fitting the behavior of smokers, including ex-smokers. The zero inflation arises because the first group now includes those who have never smoked.

We have fictional data on the smoking study just described. The
outcome variable is called **tobacco** and contains

Category Frequency Meaning |

0 78.1% Nonsmoker |

1 3.6% Weekly or less |

2 13.0% Daily, less than 20 cigarettes/day |

3 5.3% Daily, 20 or more cigarettes/day |

We believe that the 0 is inflated.

We want to fit a model in which smoking by those who have ever smoked is given by

- income
- gender
- age

And membership in the never-smoked group is determined by

- income
- gender
- age
- whether parents smoked
- religion

To fit the model, we type

.zioprobit tobacco income i.female age, inflate(income i.female age i.parent i.religion) vuongZero-inflated ordered probit regression Number of obs = 14,899 Wald chi2(3) = 751.43 Log likelihood = -10299.787 Prob > chi2 = 0.0000

tobacco | Coef. Std. Err. z P>|z| [95% Conf. Interval] | |

tobacco | ||

income | .1503256 .0057582 26.11 0.000 .1390398 .1616113 | |

tobacco | ||

female | ||

female | -.2726466 .047975 -5.68 0.000 -.3666759 -.1786173 | |

age | -.1394573 .011523 -12.10 0.000 -.1620419 -.1168727 | |

inflate | ||

income | -.0654874 .0087703 -7.47 0.000 -.082677 -.0482979 | |

female | ||

female | -.2166707 .0509783 -4.25 0.000 -.3165863 -.1167552 | |

age | .1205886 .0165181 7.30 0.000 .0882136 .1529636 | |

parent | ||

smoking | .7219495 .0436831 16.53 0.000 .6363321 .8075669 | |

religion | ||

discourages | -.2095319 .0586036 -3.58 0.000 -.3243927 -.094671 | |

_cons | -.5335904 .0873953 -6.11 0.000 -.7048821 -.3622987 | |

/cut1 | .0683114 .0881964 -.1045504 .2411731 | |

/cut2 | .2977055 .0804097 .1401054 .4553055 | |

/cut3 | 1.402649 .067253 1.270836 1.534463 | |

Vuong test of zioprobit vs. oprobit: z = 15.15 Pr > z = 0.0000 |

The standard ordered probit parameters, coefficients and cutpoints, are displayed in the first and last parts of the output, respectively.

The middle part of the output reports the probit coefficients for the inflation.

We specified the **vuong** option to obtain the Vuong test at the
end of the output. The null hypothesis is that the inflation
part of the model is unnecessary. We can reject that at any
reasonable significance level.

Coefficients can be difficult to interpret. For instance, what does
a parent smoking coefficient of 0.72 mean? It means that, on
average in the data, those whose parents are smokers are about
27% less likely to be never-smokers than those whose parents did
not use tobacco. We obtained the 27% by using Stata's **margins**
command:

.margins, predict(pnpar) dydx(parent)Average marginal effects Number of obs = 14,899 Model VCE : OIM Expression : Pr(nonparticipation), predict(pnpar) dy/dx w.r.t. : 1.parent

Delta-method | ||

dy/dx Std. Err. z P>|z| [95% Conf. Interval] | ||

parent | ||

smoking | -.266089 .015175 -17.53 0.000 -.2958314 -.2363467 | |

Note: dy/dx for factor levels is the discrete change from the base level. |

The **predict(pnpar)** option is unique to **margins** when
used after **zioprobit**. We asked **margins** to calculate
predictions of the probability of nonparticipation, which in this
example means the probability of being a never-smoker.

You can also fit Bayesian zero-inflated ordered probit models using the **bayes** prefix.

Read more about zero-inflated ordered probit in the *Stata Base Reference Manual*.