# Re: st: logistic tranformation, proportion variables

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: logistic tranformation, proportion variables Date Thu, 13 Dec 2007 12:21:50 -0500

```Marck--
No, replacing 0 with .001 is not appropriate, unless replacing it with
.0001 or .0000001 or 1e-30 etc. instead has no impact on the results,
in which case you could just drop the zeros and get the same results.
Also: Why is the sqrt(0) problematic?

My guess is that a better solution to your problem would be grounded
in theory.  What is this regression supposed to measure the effects
of?  If y is a proportion and x1 and x2 are proportions, and they
"want to be" transformed via logits, perhaps you should be using the
logs of the numerators and denominators of those variables, since
logit(a/(a+b))=ln(a)-ln(a+b)
so including the logit of a proportion X as an explanatory var is the
same as including the logs of its numerator and denominator and
constraining the coefficients N and D to satisfy N+D=0, which is a
testable restriction.  Using the logit of a proportion Y as an
explanatory var is the same as using the log of its numerator as the
depvar and the log of the denominator as a regressor and constraining
the coefficient on the log of the denominator to be 1, which is also a
testable restriction.

Of course, if the numerator is zero, the log is undefined and those
obs will drop out of the estimation.  Theory can also help you here
sometimes--in particular, perhaps the sqrt(X) is actually what has a
linear effect on Y, not X, as Nick suggests.

On Dec 13, 2007 11:58 AM, Marck Bulter <177316mb@student.eur.nl> wrote:
>
> Nick Cox wrote:
> > "Little" is not the adjective that springs to mind
> > for that help file.
> >
> > More important, I don't think that help file answers
> > much of the question here.
> >
> > As 0 and 1 are attainable, logit in the strict sense is
> > out of the question.
> >
> > It seems to me that the main issue with a predictor that is
> > a proportion is what is the shape of the function relating
> >
> > response | other predictors
> >
> > to
> >
> > proportional predictor | other predictors
> >
> > and, setting aside the instrumental variable aspect here,
> > one handle on that might be given by added variable plots
> > after a plain multiple regression -- or graphical near
> > equivalents such as -mrunning- or -mlowess-. Use -findit-
> > to locate these user-written programs.
> >
> > My first stab at this would be to consider some power of
> > the predictor, say root or square. That way 0 and 1 stay
> > as they are but you can bend the scale in the middle.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > David Airey
> >
> >
> > Nick Cox has a little Stata help file on transformations.
> >
> > ssc install transint
> >
> > Marck Bulter
> >
> >
> >> I have a question that is not entirely related to Stata. Do hope
> >> that you forgive me.
> >>
> >> Assume the following model,
> >>
> >> *ivreg* pstrmon price maturity age coupon pstrmonprev pstrprev
> >> intrest ivol compl (precmon = precmonprev)
> >>
> >> Where pstrmon, pstrmonprev, precmon and precmonprev are all
> >> proportions. In this case, value bond A / total value bonds, etc.
> >> Therefore, it can take any value between 0 and 1, 0 and 1 included.
> >> These  last 4 variables are heavily left skewed. Post estimations,
> >> resid is heteroskedastic, and resid is not normal distributed.
> >> On the Statalist server I have found several references to logistic
> >> transformations, ln(y/1-y):
> >> - http://www.stata.com/statalist/archive/2003-07/msg00285.html
> >> - home.fsw.vu.nl/m.buis/presentations/UKsug06.pdf
> >> - http://www.stata.com/statalist/archive/2006-02/msg00150.html
> >>
> >> If I transform the 4 variables using logistic transformation, the 4
> >> variables or no longer skewed, resid is almost homoskedastic, and
> >> resid is almost normal distributed.
> >> But my question is, is this transformation allowed, as I have mostly
> >> seen only references of transformation of the dependent variable.
> >> In addition, the transformation makes the interpretation of the
> >> coefficients hard, any comment on this?
> >>
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
>
> Dear Nick,
>
> I have read the transit files, these are very informative. Thank you for
> sharing. And thanks to David Airey for pointing me to transit. But
> indeed, these do not answer my question entirely.
>
> Strictly, 100% is possible, but the proportion data I have range from 0
> to 0.8. The author of the following published article,
>
> http://www.cepr.org/pubs/new-dps/dplist.asp?dpno=5153
>
> converts 0 values to, 0.001 and 1 to 0.999. Not the most prettiest
> solution, but strictly logistic trans. is no longer out of the question.
> My master thesis is an extension of a previous research, where the
> author also used proportion dependent and independent variables, but he
> did not explain if and if he did, how he transformed the variables.
>
> For your suggestion on root and square, Sqrt does improve thinks a bit,
> but of course the 0 values are problematic, in addition the resid
> assumptions are problematic.
> Do you think that the conversion to 0.001 is appropriate? And more
> important, is it appropriate to use logistic transformed variables both
> as dependent and independent variables?
>
> Sorry for not being entirely accurate the first time.
>
> Regards,
> Marck Bulter
> Currently, mlowess is running, it is a bit computer intensive.
>
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```