Home  /  Products  /  Stata 18  /  IV fractional probit model

<- See Stata 18's new features


  • Fractional outcomes

  • One or more continuous endogenous covariates

  • Estimated covariance of endogenous error

Fractional outcomes are common. You might be modeling participation rates in a 401(k) pension plan, the pass rate on standardized tests, expenditure shares, or the like.

Fractional response models are a flexible and intuitive way to model outcomes that lie between 0 and 1. They do not have the problem of linear models that will yield predictions outside 0 and 1 or the problem of log-odds models that are undefined at 0 and 1. Fractional response models can be fit using the fracreg command.

What if you are concerned that one or more of your model covariates are endogenous? With the new ivfprobit command, you can fit a model for a fractional dependent variable and account for endogeneity in one or more of the covariates.

Let's see it work

We want to study 401(k) participation rate (prate). We believe that corporate employment size (ltotemp) and its square are determinants of participation rates, as are an indicator of whether the 401(k) is the sole pension plan (sole) and the plan matching rate (mrate). We believe, however, that the plan matching rate is endogenous. In other words, there are unobserved determinants of participation rates that also affect the plan matching rate. For instance, matching rate and participation rate might be associated with industry practices and regional practices not observable in the data. To address endogeneity, we instrument matching rate using the age of the plan (age) and its square.

We type

. ivfprobit prate c.ltotemp##c.ltotemp i.sole (mrate = c.age##c.age)

Inside the parentheses is the endogenous variable along with the instrumental variables we used to model it. Outside the parentheses are the exogenous variables, that affect prate directly. We get

. ivfprobit prate c.ltotemp##c.ltotemp i.sole (mrate = c.age##c.age)

Fitting exogenous fractional probit model:
Iteration 0:  Log pseudolikelihood = -1769.7046
Iteration 1:  Log pseudolikelihood = -1675.4223
Iteration 2:  Log pseudolikelihood = -1674.7663
Iteration 3:  Log pseudolikelihood = -1674.7661
Iteration 4:  Log pseudolikelihood = -1674.7661

Fitting full model:
Iteration 0:  Log pseudolikelihood =  -3712.498
Iteration 1:  Log pseudolikelihood = -3712.4767
Iteration 2:  Log pseudolikelihood = -3712.4767

Fractional probit model with endogenous regressors

                                                        Number of obs =  4,075
                                                        Wald chi2(4)  = 907.06
Log pseudolikelihood = -3712.4767                       Prob > chi2   = 0.0000
Coefficient std. err. z P>|z| [95% conf. interval]
mrate 1.907922 .0946094 20.17 0.000 1.722491 2.093353
ltotemp -.4229273 .0744177 -5.68 0.000 -.5687833 -.2770713
c.ltotemp .0217492 .0046476 4.68 0.000 .01264 .0308583
Only plan -.1733119 .0366136 -4.73 0.000 -.2450733 -.1015504
_cons 1.904103 .3199032 5.95 0.000 1.277104 2.531102
e.prate) -.5690386 .0431738 -.6476498 -.4784406
sd(e.mrate) .3989664 .0061807 .3870345 .4112661
Wald test of exogeneity: chi2(1) = 102.40 Prob > chi2 = 0.0000 Endogenous: mrate Exogenous: ltotemp c.ltotemp#c.ltotemp 1.sole age c.age#c.age

We find a positive effect of the matching rate on the participation rate. Additionally, we see that the estimated correlation between the unobservables, corr(e.mrate, e.prate), is different from zero. This means there is evidence to support our endogeneity conjecture.

Made for data science.

Get started today.