[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Dependent var is a proportion, with large spike in .95+ |

Date |
Thu, 4 Sep 2008 13:12:53 +0100 |

My take differs from anybody else! From what you say, this is not a spike. It is just strong skewness. A spike in my book is a big group of identical values, in this context usually lots of exact zeros or exact ones (or 100%s, naturally). A good approximation is if that you take logits of a beta-distributed variable, the distribution looks bell-shaped. That's true even for highly skewed betas with modes near 0 or near 1. Here, as in many other places, the logit works wonders. So, your proportion data are fit for a beta model to the extent that their logits look bell-shaped. Of course, you might end up fitting a mediocre model if you can't think of or fit a better one. However, if you have any exact zeros or ones, you can't take logits, and equivalently you can't really fit a beta. You need either a fudge that denies that the zeros or ones really are that or a mixture model such as others are referring to. Nick [not Nic] n.j.cox@durham.ac.uk Dan Weitzenfeld I am trying to determine which testing factors drive a proportion dependent variable, PercentNoise. In searching the archives, I came across -betafit-, and a link to the FAQ: "How do you fit a model when the dependent variable is a proportion?" In that response, Allen McDowell and Nic Cox write, "In practice, it is often helpful to look at the frequency distribution: a marked spike at zero or one may well raise doubt about a single model fitted to all data." That describes my situation exactly: I have a marked spike in my histogram at the top bin, roughly .95 - 1. I am wondering how to account for this. Does -betafit- take such a possibility into account? Can someone briefly describe how I could use multiple models to fit all the data, as implied in the FAQ response? My fallback is setting a pass/fail bar and converting my proportions to a binary, then using probit/logit. But the obvious drawback is that I am throwing away information by collapsing the continuous (albeit bounded) proportion variable to a binary. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Dependent var is a proportion, with large spike in .95+***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**References**:**st: Dependent var is a proportion, with large spike in .95+***From:*"Dan Weitzenfeld" <dan.weitzenfeld@emsense.com>

- Prev by Date:
**Re: st: Treating multiple-variables as one** - Next by Date:
**RE: st: Interpreting A Granger Causality Test** - Previous by thread:
**Re: st: Dependent var is a proportion, with large spike in .95+** - Next by thread:
**Re: st: RE: Dependent var is a proportion, with large spike in .95+** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |