FAQ: Logit transformation

Home / Resources & support / FAQs / Logit transformation

Note: In Stata 14, two new commands for modeling proportions, fracreg and betareg, were introduced.

How do you fit a model when the dependent variable is a proportion?

Title		Logit transformation
Author		Allen McDowell, StataCorp Nicholas J. Cox, Durham University, UK

A traditional solution to this problem is to perform a logit transformation on the data. Suppose that your dependent variable is called y and your independent variables are called X. Then, one assumes that the model that describes y is

        y = invlogit(XB)

If one then performs the logit transformation, the result is

        ln( y / (1 - y) ) = XB

We have now mapped the original variable, which was bounded by 0 and 1, to the real line. One can now fit this model using OLS or WLS, for example by using regress. Of course, one cannot perform the transformation on observations where the dependent variable is zero or one; the result will be a missing value, and that observation would subsequently be dropped from the estimation sample.

A better alternative is to estimate using glm with family(binomial), link(logit), and vce(robust); this is the method proposed by Papke and Wooldridge (1996). At the time this article was published, Stata’s glm command could not fit such models, and this fact is noted in the article. glm has since been enhanced specifically to deal with fractional response data.

In either case, there may well be a substantive issue of interpretation. Let us focus on interpreting zeros: the same kind of issue may well arise for ones. Suppose the y variable is proportion of days workers spend off sick. There are two extreme possibilities. The first extreme is that all observed zeros are in effect sampling zeros: each worker has some nonzero probability of being off sick, and it is merely that some workers were not, in fact, off sick in our sample period. Here, we would often want to include the observed zeros in our analysis and the glm route is attractive. The second extreme is that some or possibly all observed zeros must be considered as structural zeros: these workers will not ever report sick, because of robust health and exemplary dedication. These are extremes, and intermediate cases are also common. In practice, it is often helpful to look at the frequency distribution: a marked spike at zero or one may well raise doubt about a single model fitted to all data.

A second example might be data on trading links between countries. Suppose the y variable is proportion of imports from a certain country. Here a zero might be structural if two countries never trade, say on political or cultural grounds. A model that fits over both the zeros and the nonzeros might not be advisable, so that a different kind of model should be considered.

For an excellent broader discussion, see Baum (2008).

References

Baum, C.F. 2008.: Modeling proportions. Stata Journal 8: 299–303.

Papke, L. E. and J. Wooldridge. 1996.: Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics 11: 619–632.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do you fit a model when the dependent variable is a proportion?

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do you fit a model when the dependent variable is a proportion?

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies