[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Marck Bulter <177316mb@student.eur.nl> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: logistic tranformation, proportion variables |

Date |
Fri, 14 Dec 2007 00:54:14 +0100 |

Austin Nichols wrote:

Marck--Dear Austin,

No, replacing 0 with .001 is not appropriate, unless replacing it with

.0001 or .0000001 or 1e-30 etc. instead has no impact on the results,

in which case you could just drop the zeros and get the same results.

Also: Why is the sqrt(0) problematic?

My guess is that a better solution to your problem would be grounded

in theory. What is this regression supposed to measure the effects

of? If y is a proportion and x1 and x2 are proportions, and they

"want to be" transformed via logits, perhaps you should be using the

logs of the numerators and denominators of those variables, since

logit(a/(a+b))=ln(a)-ln(a+b)

so including the logit of a proportion X as an explanatory var is the

same as including the logs of its numerator and denominator and

constraining the coefficients N and D to satisfy N+D=0, which is a

testable restriction. Using the logit of a proportion Y as an

explanatory var is the same as using the log of its numerator as the

depvar and the log of the denominator as a regressor and constraining

the coefficient on the log of the denominator to be 1, which is also a

testable restriction.

Of course, if the numerator is zero, the log is undefined and those

obs will drop out of the estimation. Theory can also help you here

sometimes--in particular, perhaps the sqrt(X) is actually what has a

linear effect on Y, not X, as Nick suggests.

On Dec 13, 2007 11:58 AM, Marck Bulter <177316mb@student.eur.nl> wrote:

Nick Cox wrote:*

"Little" is not the adjective that springs to mindDear Nick,

for that help file.

More important, I don't think that help file answers

much of the question here.

As 0 and 1 are attainable, logit in the strict sense is

out of the question.

It seems to me that the main issue with a predictor that is

a proportion is what is the shape of the function relating

response | other predictors

to

proportional predictor | other predictors

and, setting aside the instrumental variable aspect here,

one handle on that might be given by added variable plots

after a plain multiple regression -- or graphical near

equivalents such as -mrunning- or -mlowess-. Use -findit-

to locate these user-written programs.

My first stab at this would be to consider some power of

the predictor, say root or square. That way 0 and 1 stay

as they are but you can bend the scale in the middle.

Nick

n.j.cox@durham.ac.uk

David Airey

Nick Cox has a little Stata help file on transformations.

ssc install transint

Marck Bulter

I have a question that is not entirely related to Stata. Do hope*

that you forgive me.

Assume the following model,

*ivreg* pstrmon price maturity age coupon pstrmonprev pstrprev

intrest ivol compl (precmon = precmonprev)

Where pstrmon, pstrmonprev, precmon and precmonprev are all

proportions. In this case, value bond A / total value bonds, etc.

Therefore, it can take any value between 0 and 1, 0 and 1 included.

These last 4 variables are heavily left skewed. Post estimations,

resid is heteroskedastic, and resid is not normal distributed.

On the Statalist server I have found several references to logistic

transformations, ln(y/1-y):

- http://www.stata.com/statalist/archive/2003-07/msg00285.html

- home.fsw.vu.nl/m.buis/presentations/UKsug06.pdf

- http://www.stata.com/statalist/archive/2006-02/msg00150.html

If I transform the 4 variables using logistic transformation, the 4

variables or no longer skewed, resid is almost homoskedastic, and

resid is almost normal distributed.

But my question is, is this transformation allowed, as I have mostly

seen only references of transformation of the dependent variable.

In addition, the transformation makes the interpretation of the

coefficients hard, any comment on this?

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

I have read the transit files, these are very informative. Thank you for

sharing. And thanks to David Airey for pointing me to transit. But

indeed, these do not answer my question entirely.

Strictly, 100% is possible, but the proportion data I have range from 0

to 0.8. The author of the following published article,

http://www.cepr.org/pubs/new-dps/dplist.asp?dpno=5153

converts 0 values to, 0.001 and 1 to 0.999. Not the most prettiest

solution, but strictly logistic trans. is no longer out of the question.

My master thesis is an extension of a previous research, where the

author also used proportion dependent and independent variables, but he

did not explain if and if he did, how he transformed the variables.

For your suggestion on root and square, Sqrt does improve thinks a bit,

but of course the 0 values are problematic, in addition the resid

assumptions are problematic.

Do you think that the conversion to 0.001 is appropriate? And more

important, is it appropriate to use logistic transformed variables both

as dependent and independent variables?

Sorry for not being entirely accurate the first time.

Regards,

Marck Bulter

Currently, mlowess is running, it is a bit computer intensive.

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

I have tried against your advice, to transfer the 0's to 0.001 .0001 etc. But the resid scatter plot shows a clear straight bar through the scatter plot. So a number of resid's are lining up, and the effect becomes more profound (bar moves out of the cloud) if I decrease the conversion value. As a result I will skip the conversion. I have no idea how the author of

http://www.cepr.org/pubs/new-dps/dplist.asp?dpno=5153

managed to publish his paper, since I would expect that (s)he will see a similar result, (no resid scatter printed?).

But anyway, to answer your question regarding sqrt, the resid cloud is oriented to the left, and has a clear cutoff line from zero diagonal to the x axes. (I wish I could attach the plot, but I don't want to jam others mailboxes). So I am inclined to look for other options.

Regarding, logit(a/(a+b))=ln(a)-ln(a+b). I will (n/d) test what log n and log d gives. Thank you pointing this out, interesting suggestion.

To explain a bit about what I am trying to measure:

ivreg pstrmon price maturity age coupon l.pstrmonprev l.pstr interest ivolatily compl (precmon = l.precmon)

and

ivreg precmon price maturity age coupon l.precmon l.pstr intr ivol compl (pstrmon = l.pstrmon)

Here, pstrmon is the proportion ($ value) of U.S. Treasury bond i, of the total value outstanding of bond type i, that is converted to zero coupon bonds per month, thus it is a measure of activity. precmon is similar, but opposite. The value that is converted to U.S. Treasury bonds of type i, of the total value of bond type i. pstr, is the proportion of bond ($ value) i, that is held in converted form. So this is a dynamic process, going from normal bond to zero coupon bond, and back. In fixed income terms I would refer to this process as stripping and reconstitution activity. The other variables are bond properties, like price, coupon rate, maturity age, etc. To go back to the 0 value, in some months there is no activity. To make things even more interesting, this is a panel data study, since i=103, time period, 97 to 2006.

regards,

Marck Bulter

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**st: logistic tranformation, proportion variables***From:*Marck Bulter <177316mb@student.eur.nl>

**Re: st: logistic tranformation, proportion variables***From:*David Airey <david.airey@Vanderbilt.Edu>

**RE: st: logistic tranformation, proportion variables***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: logistic tranformation, proportion variables***From:*Marck Bulter <177316mb@student.eur.nl>

**Re: st: logistic tranformation, proportion variables***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: 2SLS with multiple endogenous variables** - Next by Date:
**Re: st: logistic tranformation, proportion variables** - Previous by thread:
**Re: st: logistic tranformation, proportion variables** - Next by thread:
**Re: st: logistic tranformation, proportion variables** - Index(es):

© Copyright 1996–2023 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |