Fractional outcome regression

Order

Watch video demo

<- See Stata's other features

Highlights

Model fractions, proportions, rates, etc.
Fractional probit model
Fractional logit model
Fractional heteroskedastic probit model
Odds ratios for fractional logit models
Beta regression

Fractional responses concern outcomes between zero and one.

The most natural way fractional responses arise is from averaged 0/1 outcomes. In such cases, if you know the denominator, you want to estimate such models using standard probit or logistic regression. For instance, the fractional response might be 0.25, but if the data also include that 4 out of 36 had a positive outcome, you can use the standard estimation commands.

Fractional response models are for use when the denominator is unknown. That can include averaged 0/1 outcomes such as participation rates, but can also include variables that are naturally on a 0 to 1 scale such as pollution levels, patient oxygen saturation, and Gini coefficients (inequality measures).

Fractional response estimators fit models on continuous zero to one data using probit, logit, heteroskedastic probit, and beta regression. Beta regression can be used only when the endpoints zero and one are excluded.

Let's see it work

We are going to analyze an air-pollution index that is scaled 0 to 1, inclusive, although 1 (complete pollution) is virtually impossible, and in our data, we observe values only up to 0.8. We do observe the opposite endpoint, however. Zero means no measurable pollution. Our data are for various cities.

In this 0 to 1 variable, values between 0 and 0.3 have no public health implications, and values greater than 0.7 imply people with breathing or health problems should remain indoors.

We model pollution as determined by the number of older, pollution-producing cars per capita; percentage of output due to industry; and annual rainfall. We use probit. We type

. fracreg probit pollution oldcars rainfall industrial

Iteration 0:   log pseudolikelihood = -1001.8481
Iteration 1:   log pseudolikelihood = -806.74595
Iteration 2:   log pseudolikelihood = -806.55309
Iteration 3:   log pseudolikelihood = -806.55309

Fractional probit regression                            Number of obs =  1,234
                                                        Wald chi2(3)  = 116.91
                                                        Prob > chi2   = 0.0000
Log pseudolikelihood = -806.55309                       Pseudo R2     = 0.0060



                             Robust
   pollution  Coefficient  std. err.      z    P>|z|     [95% conf. interval]

     oldcars    .7689171   .1748695     4.40   0.000     .4261791    1.111655
    rainfall   -.3165829   .0350128    -9.04   0.000    -.3852067   -.2479592
  industrial    .2295972    .053877     4.26   0.000     .1240002    .3351942
       _cons   -.3840791   .0393275    -9.77   0.000    -.4611596   -.3069986

We find more pollution where there are older cars, less rainfall, and more industry. How good are you at reading probit's N(0,1) standardized coefficients?

margins will make interpreting our results easier. We can ask margins to report elasticities, which is to say, the percentage change in pollution for a 1% change in the covariate:

. margins, dyex(_all)

Average marginal effects                                 Number of obs = 1,234
Model VCE: Robust

Expression: Conditional mean of pollution, predict()
dy/ex wrt:  oldcars rainfall industrial



                          Delta-method
                   dy/ex   std. err.      z    P>|z|     [95% conf. interval]

     oldcars    .0411578   .0093973     4.38   0.000     .0227393    .0595763
    rainfall   -.0581577   .0062469    -9.31   0.000    -.0704014   -.0459139
  industrial    .0347474   .0081857     4.24   0.000     .0187037     .050791

We find that a 1% increase of older cars per capita increases pollution by 0.041, a 1% increase in rainfall decreases pollution by 0.058, and a 1% increase in industrial production increases pollution by 0.035.

A truly careful reader will have noticed that we typed dyex(), not eyex(). The dependent variable is already a proportion and so is already on a percentage scale. We just need its change, not its percentage change.

Let's see it work with beta regression

Let's look at the effect of democratic institutions on income inequality. We have fictional data on a cross-section of countries in which inequality is measured using the Gini coefficient. The Gini coefficient is one if one person has all the income in a society and zero if income is equally divided among everyone. Values of zero and one simply do not happen, of course. In our data, the average Gini coefficient is 0.41. For your information, Sweden's coefficient is roughly 0.23 in 2005 (they are proud of their equality), and Haiti's is 0.59.

The beta distribution is often used to model the Gini coefficient and other zero to one variables that can have long tails and exclude the endpoints. We type

. betareg  gini i.rural i.democracy i.colony, nolog

Beta regression                                 Number of obs     =        160
                                                LR chi2(6)        =     146.52
                                                Prob > chi2       =     0.0000

Link function  :  g(u) = log(u/(1-u))           [Logit]
Slink function :  g(u) = log(u)                 [Log]

Log likelihood =  157.79178



        gini   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

gini         
       rural 
      rural     .1567357   .0680008     2.30   0.021     .0234567    .2900147
             
   democracy 
        low    -.4798286   .0748253    -6.41   0.000    -.6264834   -.3331737
     medium    -.7774981   .0931349    -8.35   0.000    -.9600391    -.594957
   med-high    -1.303923   .1363737    -9.56   0.000    -1.571211   -1.036636
       high    -1.521037   .1775991    -8.56   0.000    -1.869125    -1.17295
             
      colony 
     colony     .2368402   .0805578     2.94   0.003     .0789498    .3947306
       _cons   -.0471008   .0528853    -0.89   0.373     -.150754    .0565524

scale        
       _cons    3.279796   .1099443    29.83   0.000     3.064309    3.495283

We have modeled income inequality on the country's ruralness, level of democracy, and whether it was a previous colony. In these fictional data, former colonies tend to have higher inequality, and the stronger the democracy, the less the inequality.

We will use margins to make the effect of democracy easier to interpret:

. margins, dydx(democracy)

Average marginal effects                                   Number of obs = 160
Model VCE: OIM

Expression: Conditional mean of gini, predict()
dy/dx wrt:  1.democracy 2.democracy 3.democracy 4.democracy



                          Delta-method
                   dy/dx   std. err.      z    P>|z|     [95% conf. interval]

   democracy 
        low    -.1178869   .0181028    -6.51   0.000    -.1533678    -.082406
     medium    -.1860353   .0210805    -8.82   0.000    -.2273524   -.1447183
   med-high    -.2893892   .0249533   -11.60   0.000    -.3382967   -.2404817
       high    -.3245293   .0284448   -11.41   0.000      -.38028   -.2687785

Note: dy/dx for factor levels is the discrete change from the base level.

Reported are the change in the outcome variable (inequality) for a change in democracy. The base (omitted) category is total absence of democracy. Thus, being categorized as low relative to total absence of democracy decreases inequality by 0.12. Being categorized medium further decreases inequality 0.19, and so on.

Tell me more

Read more about fractional response and beta regression models in the Stata Base Reference Manual; see [R] fracreg and [R] betareg.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Robust
pollution		Coefficient std. err. z P>\|z\| [95% conf. interval]

oldcars		.7689171 .1748695 4.40 0.000 .4261791 1.111655
rainfall		-.3165829 .0350128 -9.04 0.000 -.3852067 -.2479592
industrial		.2295972 .053877 4.26 0.000 .1240002 .3351942
_cons		-.3840791 .0393275 -9.77 0.000 -.4611596 -.3069986


		Delta-method
		dy/ex std. err. z P>\|z\| [95% conf. interval]

oldcars		.0411578 .0093973 4.38 0.000 .0227393 .0595763
rainfall		-.0581577 .0062469 -9.31 0.000 -.0704014 -.0459139
industrial		.0347474 .0081857 4.24 0.000 .0187037 .050791


gini		Coefficient Std. err. z P>\|z\| [95% conf. interval]

gini
rural
rural		.1567357 .0680008 2.30 0.021 .0234567 .2900147

democracy
low		-.4798286 .0748253 -6.41 0.000 -.6264834 -.3331737
medium		-.7774981 .0931349 -8.35 0.000 -.9600391 -.594957
med-high		-1.303923 .1363737 -9.56 0.000 -1.571211 -1.036636
high		-1.521037 .1775991 -8.56 0.000 -1.869125 -1.17295

colony
colony		.2368402 .0805578 2.94 0.003 .0789498 .3947306
_cons		-.0471008 .0528853 -0.89 0.373 -.150754 .0565524

scale
_cons		3.279796 .1099443 29.83 0.000 3.064309 3.495283


		Delta-method
		dy/dx std. err. z P>\|z\| [95% conf. interval]

democracy
low		-.1178869 .0181028 -6.51 0.000 -.1533678 -.082406
medium		-.1860353 .0210805 -8.82 0.000 -.2273524 -.1447183
med-high		-.2893892 .0249533 -11.60 0.000 -.3382967 -.2404817
high		-.3245293 .0284448 -11.41 0.000 -.38028 -.2687785