# Re: st: logistic tranformation, proportion variables

 From Marck Bulter <177316mb@student.eur.nl> To statalist@hsphsun2.harvard.edu Subject Re: st: logistic tranformation, proportion variables Date Thu, 13 Dec 2007 17:58:18 +0100

```Nick Cox wrote:
```
"Little" is not the adjective that springs to mind
for that help file.
More important, I don't think that help file answers
much of the question here.
As 0 and 1 are attainable, logit in the strict sense is out of the question.
It seems to me that the main issue with a predictor that is a proportion is what is the shape of the function relating
response | other predictors
to
proportional predictor | other predictors
and, setting aside the instrumental variable aspect here, one handle on that might be given by added variable plots
after a plain multiple regression -- or graphical near
equivalents such as -mrunning- or -mlowess-. Use -findit- to locate these user-written programs.
My first stab at this would be to consider some power of the predictor, say root or square. That way 0 and 1 stay as they are but you can bend the scale in the middle.
Nick n.j.cox@durham.ac.uk
David Airey

Nick Cox has a little Stata help file on transformations.

ssc install transint

Marck Bulter

I have a question that is not entirely related to Stata. Do hope that you forgive me.

Assume the following model,

*ivreg* pstrmon price maturity age coupon pstrmonprev pstrprev intrest ivol compl (precmon = precmonprev)

Where pstrmon, pstrmonprev, precmon and precmonprev are all proportions. In this case, value bond A / total value bonds, etc. Therefore, it can take any value between 0 and 1, 0 and 1 included.
These last 4 variables are heavily left skewed. Post estimations, resid is heteroskedastic, and resid is not normal distributed.
On the Statalist server I have found several references to logistic transformations, ln(y/1-y):
- http://www.stata.com/statalist/archive/2003-07/msg00285.html
- home.fsw.vu.nl/m.buis/presentations/UKsug06.pdf
- http://www.stata.com/statalist/archive/2006-02/msg00150.html

If I transform the 4 variables using logistic transformation, the 4 variables or no longer skewed, resid is almost homoskedastic, and resid is almost normal distributed.
But my question is, is this transformation allowed, as I have mostly seen only references of transformation of the dependent variable.
In addition, the transformation makes the interpretation of the coefficients hard, any comment on this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Dear Nick,

I have read the transit files, these are very informative. Thank you for sharing. And thanks to David Airey for pointing me to transit. But indeed, these do not answer my question entirely.

Strictly, 100% is possible, but the proportion data I have range from 0 to 0.8. The author of the following published article,

http://www.cepr.org/pubs/new-dps/dplist.asp?dpno=5153

converts 0 values to, 0.001 and 1 to 0.999. Not the most prettiest solution, but strictly logistic trans. is no longer out of the question. My master thesis is an extension of a previous research, where the author also used proportion dependent and independent variables, but he did not explain if and if he did, how he transformed the variables.

For your suggestion on root and square, Sqrt does improve thinks a bit, but of course the 0 values are problematic, in addition the resid assumptions are problematic. Do you think that the conversion to 0.001 is appropriate? And more important, is it appropriate to use logistic transformed variables both as dependent and independent variables?

Sorry for not being entirely accurate the first time.

Regards,
Marck Bulter
Currently, mlowess is running, it is a bit computer intensive.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/