Zero-inflated ordered logit model

Order

Watch video demo

<- See Stata's other features

Highlights

Model ordinal data with an overabundance of responses in the lowest category
Use a logit model to identify zero inflation and an ordered logit model for the ordinal response
Use a potentially different set of predictors for the logit and ordered logit model
Easily interpret findings using odds ratios and marginal probabilities
Support for Bayesian estimation
Robust, cluster–robust, and bootstrap standard errors
Complex survey designs support
See more features for binary outcomes

Stata's ziologit command fits zero-inflated ordered logit models.

Ordered logit regression is used to model ordered categorical responses, such as symptom severity recorded as none, mild, moderate, or severe. Larger values of such ordered outcomes represent higher levels, but the numeric value is irrelevant.

In some situations, more zeros (or more values in the lowest category) are observed in the data than would be expected by a traditional ordered logit model. A zero might represent the absence of a trait, while the remaining values represent increasing levels of the trait. Many zeros may be observed, some because the individual does not have the trait and some because an individual has the trait but exhibits the lowest level. For example:

In a study of alcohol consumption, some individuals report no consumption because they never drink alcohol, while others may report no alcohol consumption because they did not drink in the survey period.

In a clinical trial of a treatment intended to shrink tumors, outcomes represent no improvement, partial response, or complete response. An individual may show no improvement because the tumor is resistant to treatment or because the tumor was treatable but did not shrink at the time of measurement. The distinction is important because treatable tumors are good candidates for a higher dose.

In contexts such as these, you can use a zero-inflated ordered logit (ZIOL) model. ZIOL models assume that the lowest-valued outcomes come from both a logit model and an ordered logit model, allowing different sets of predictors for each model.

Let's see it work

Let's use fictional data on daily cigarette consumption. The codebook command shows us the four levels of cigarette consumption.

. webuse tobacco
(Fictional tobacco consumption data)

. codebook tobacco


    
    tobacco                                                           Tobacco usage
    
                                                                                   
                      Type: Numeric (byte)                                         
                     Label: tobaclbl                                               
                                                                                   
                     Range: [0,3]                         Units: 1                 
             Unique values: 4                         Missing .: 0/15,000          
                                                                                   
                Tabulation: Freq.   Numeric  Label                                 
                            9,469         0  0 cigarettes                          
                            3,806         1  1–7 cigarettes/day                    
                            1,050         2  8–12 cigarettes/day                   
                              675         3  >12 cigarettes/day

More than half the respondents reported zero cigarette consumption. A zero may be reported for two reasons—because a respondent is always a nonsmoker or because a respondent is susceptible to smoking but did not smoke in the time period for which the data were collected. A traditional ordered logit model cannot distinguish between the two causes of zero cigarette consumption. The ZIOL model allows us to model the probability of being susceptible to smoking in addition to modeling level of consumption.

We fit the ZIOL model by using ziologit. We model the level of cigarette consumption as a function of education (education), income in $10,000s (income), and gender (female). We specify the inflate() option to model the probability of being a smoker as a function of education, income, and whether the respondent's parents smoked (parent).

. ziologit tobacco education income i.female, inflate(income education i.parent)

Iteration 0:  Log likelihood = -15977.364  (not concave)
Iteration 1:  Log likelihood =  -13149.83  (not concave)
Iteration 2:  Log likelihood = -12467.245
Iteration 3:  Log likelihood = -11039.218
Iteration 4:  Log likelihood = -9929.2298
Iteration 5:  Log likelihood = -9715.1143
Iteration 6:  Log likelihood = -9703.2464
Iteration 7:  Log likelihood = -9703.2168
Iteration 8:  Log likelihood = -9703.2168

Zero-inflated ordered logit regression                Number of obs =  15,000
                                                      Wald chi2(3)  = 3147.70
Log likelihood = -9703.2168                           Prob > chi2   =  0.0000



    tobacco   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

tobacco                                                                      
  education     .5090816   .0094838    53.68   0.000     .4904938    .5276695
     income      .583636   .0114401    51.02   0.000     .5612139    .6060581
                                                                             
     female                                                                  
    Female     -.5307721   .0580736    -9.14   0.000    -.6445943   -.4169499

inflate                                                                      
     income    -.1279677     .00705   -18.15   0.000    -.1417856   -.1141499
  education    -.1412459   .0049693   -28.42   0.000    -.1509855   -.1315062
                                                                             
     parent                                                                  
   Smoking      1.187864   .0529432    22.44   0.000     1.084097     1.29163
      _cons     2.617219   .1156891    22.62   0.000     2.390473    2.843966

      /cut1      5.85957    .104449                      5.654853    6.064286
      /cut2     11.14187   .1945483                      10.76056    11.52318
      /cut3      14.3632   .2495117                      13.87417    14.85224

The first section of the table, labeled "tobacco", reports results for the ordered logit model of cigarette consumption. The second section, labeled "inflate", reports results for the logit model of the probability of being a smoker.

To more easily interpret the results from the first two sections, we request that ziologit show odds ratios rather than coefficients.

. ziologit, or

Zero-inflated ordered logit regression                Number of obs =  15,000
                                                      Wald chi2(3)  = 3147.70
Log likelihood = -9703.2168                           Prob > chi2   =  0.0000



    tobacco   Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]

tobacco                                                                      
  education     1.663763   .0157788    53.68   0.000     1.633122    1.694978
     income     1.792544   .0205068    51.02   0.000     1.752799    1.833191
                                                                             
     female                                                                  
    Female      .5881507    .034156    -9.14   0.000     .5248755     .659054

inflate                                                                      
     income     .8798818   .0062032   -18.15   0.000     .8678073    .8921242
  education     .8682758   .0043147   -28.42   0.000     .8598602    .8767738
                                                                             
     parent                                                                  
   Smoking      3.280066   .1736572    22.44   0.000     2.956768    3.638714
      _cons     13.69758   1.584661    22.62   0.000     10.91866    17.18378

      /cut1      5.85957    .104449                      5.654853    6.064286
      /cut2     11.14187   .1945483                      10.76056    11.52318
      /cut3      14.3632   .2495117                      13.87417    14.85224

Note: Estimates are transformed only in the first 2 equations.
Note: _cons estimates baseline odds.

A $10,000 increase in annual income decreases the odds of being a smoker by a factor of 0.88 (12% decrease in odds) but, among smokers, increases the odds of higher cigarette consumption by a factor of 1.79 (79% increase in odds). This suggests that wealthier individuals are less likely to smoke, but if they do decide to smoke, they tend to smoke more cigarettes.

But what do these results mean in terms of the probability of exhibiting different smoking behavior? Suppose we want to know the relationship of cigarette consumption to income level. For that, we use the margins command. For annual incomes of $0, $50,000, $100,000, $150,00, and $200,000, we estimate the expected probabilities of each cigarette consumption level.

. margins, at(income=(0(5)20))

Predictive margins                                      Number of obs = 15,000
Model VCE: OIM

1._predict : Pr(tobacco=0), predict(pmargin outcome(0))
2._predict : Pr(tobacco=1), predict(pmargin outcome(1))
3._predict : Pr(tobacco=2), predict(pmargin outcome(2))
4._predict : Pr(tobacco=3), predict(pmargin outcome(3))

1._at: income =  0
2._at: income =  5
3._at: income = 10
4._at: income = 15
5._at: income = 20



                          Delta-method                                        
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

_predict#_at                                                                  
        1 1     .7428698   .0044443   167.15   0.000     .7341591    .7515805
        1 2     .6190759   .0038733   159.83   0.000     .6114843    .6266675
        1 3     .5168462   .0052057    99.29   0.000     .5066433    .5270492
        1 4      .526699   .0092168    57.15   0.000     .5086344    .5447636
        1 5     .6340465   .0138387    45.82   0.000     .6069232    .6611697
        2 1     .2121431   .0034296    61.86   0.000     .2054211    .2188651
        2 2     .2792459   .0033861    82.47   0.000     .2726092    .2858826
        2 3     .3042245   .0040212    75.65   0.000     .2963431     .312106
        2 4     .2226386   .0050478    44.11   0.000     .2127452     .232532
        2 5     .0633686   .0047963    13.21   0.000     .0539681    .0727692
        3 1     .0372614   .0014098    26.43   0.000     .0344983    .0400245
        3 2     .0737865   .0019981    36.93   0.000     .0698702    .0777027
        3 3     .1146585   .0029075    39.44   0.000     .1089599    .1203572
        3 4     .1351544   .0041403    32.64   0.000     .1270395    .1432693
        3 5      .138638   .0052133    26.59   0.000     .1284201    .1488559
        4 1     .0077257   .0005647    13.68   0.000     .0066189    .0088324
        4 2     .0278917   .0011614    24.01   0.000     .0256153     .030168
        4 3     .0642707    .002228    28.85   0.000     .0599038    .0686376
        4 4      .115508   .0045623    25.32   0.000     .1065661      .12445
        4 5     .1639469   .0085572    19.16   0.000      .147175    .1807188

We estimated many expected probabilities. It is helpful to visualize the results by using marginsplot.

The probability of smoking 0 cigarettes decreases as annual income increases until $100,000; then, the probability gradually increases again. The probability of smoking 1–7 cigarettes/day is highest when earnings are $100,000 per year, and lowest when earnings are $200,000 per year.

We now want to examine the relationship between income and the susceptibility to smoking. We add the predict(ps) option to margins to request the estimates of predicted probability of susceptibility.

. quietly margins, predict(ps) at(income=(0(5)20))

. marginsplot

Four-fifths of respondents when income is zero are susceptible to smoking. The probability of being a smoker decreases with increasing income, with just over a third of respondents susceptible to smoking when earnings are $200,000 per year. This supports the interpretation that income may act as a proxy for health consciousness.

Next we use margins to focus on subjects who are susceptible to smoking. By specifying statistic pcond1 along with each outcome level, we calculate the probability of each level of tobacco, conditional on susceptibility. As before, calculations are performed at five levels of income and graphed with marginsplot.

. quietly margins, predict(pcond1 outcome(0)) predict(pcond1 outcome(1))
     predict(pcond1 outcome(2)) predict(pcond1 outcome(3)) at(income=(0(5)20))

When annual income is zero, well over half of those susceptible to smoking report zero cigarette consumption, and those who do consume cigarettes are most likely to smoke just a few cigarettes per day. As income increases, the probability of zero consumption falls. Higher annual income is associated with a higher probability of being a heavy smoker. This suggests that, among smokers, cigarettes are treated as what economists call a normal good, that is, something for which demand increases when income increases.

We can see from this example that the effect of income on cigarette consumption is multifaceted. The ziologit command makes it possible to model smoking susceptibility as well as smoking intensity, leading to a better understanding of the factors influencing smoking behavior.

Tell me more

You can also fit Bayesian zero-inflated ordered logit regression models using the bayes prefix.

Learn more about zero-inflated ordered logit in the Stata Base Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


tobacco Tobacco usage


Type: Numeric (byte)
Label: tobaclbl

Range: [0,3] Units: 1
Unique values: 4 Missing .: 0/15,000

Tabulation: Freq. Numeric Label
9,469 0 0 cigarettes
3,806 1 1–7 cigarettes/day
1,050 2 8–12 cigarettes/day
675 3 >12 cigarettes/day


tobacco		Coefficient Std. err. z P>\|z\| [95% conf. interval]

tobacco
education		.5090816 .0094838 53.68 0.000 .4904938 .5276695
income		.583636 .0114401 51.02 0.000 .5612139 .6060581

female
Female		-.5307721 .0580736 -9.14 0.000 -.6445943 -.4169499

inflate
income		-.1279677 .00705 -18.15 0.000 -.1417856 -.1141499
education		-.1412459 .0049693 -28.42 0.000 -.1509855 -.1315062

parent
Smoking		1.187864 .0529432 22.44 0.000 1.084097 1.29163
_cons		2.617219 .1156891 22.62 0.000 2.390473 2.843966

/cut1		5.85957 .104449 5.654853 6.064286
/cut2		11.14187 .1945483 10.76056 11.52318
/cut3		14.3632 .2495117 13.87417 14.85224


tobacco		Odds ratio Std. err. z P>\|z\| [95% conf. interval]

tobacco
education		1.663763 .0157788 53.68 0.000 1.633122 1.694978
income		1.792544 .0205068 51.02 0.000 1.752799 1.833191

female
Female		.5881507 .034156 -9.14 0.000 .5248755 .659054

inflate
income		.8798818 .0062032 -18.15 0.000 .8678073 .8921242
education		.8682758 .0043147 -28.42 0.000 .8598602 .8767738

parent
Smoking		3.280066 .1736572 22.44 0.000 2.956768 3.638714
_cons		13.69758 1.584661 22.62 0.000 10.91866 17.18378

/cut1		5.85957 .104449 5.654853 6.064286
/cut2		11.14187 .1945483 10.76056 11.52318
/cut3		14.3632 .2495117 13.87417 14.85224


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

_predict#_at
1 1		.7428698 .0044443 167.15 0.000 .7341591 .7515805
1 2		.6190759 .0038733 159.83 0.000 .6114843 .6266675
1 3		.5168462 .0052057 99.29 0.000 .5066433 .5270492
1 4		.526699 .0092168 57.15 0.000 .5086344 .5447636
1 5		.6340465 .0138387 45.82 0.000 .6069232 .6611697
2 1		.2121431 .0034296 61.86 0.000 .2054211 .2188651
2 2		.2792459 .0033861 82.47 0.000 .2726092 .2858826
2 3		.3042245 .0040212 75.65 0.000 .2963431 .312106
2 4		.2226386 .0050478 44.11 0.000 .2127452 .232532
2 5		.0633686 .0047963 13.21 0.000 .0539681 .0727692
3 1		.0372614 .0014098 26.43 0.000 .0344983 .0400245
3 2		.0737865 .0019981 36.93 0.000 .0698702 .0777027
3 3		.1146585 .0029075 39.44 0.000 .1089599 .1203572
3 4		.1351544 .0041403 32.64 0.000 .1270395 .1432693
3 5		.138638 .0052133 26.59 0.000 .1284201 .1488559
4 1		.0077257 .0005647 13.68 0.000 .0066189 .0088324
4 2		.0278917 .0011614 24.01 0.000 .0256153 .030168
4 3		.0642707 .002228 28.85 0.000 .0599038 .0686376
4 4		.115508 .0045623 25.32 0.000 .1065661 .12445
4 5		.1639469 .0085572 19.16 0.000 .147175 .1807188