[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jverkuilen <jverkuilen@gc.cuny.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: Dependent var is a proportion, with large spike in .95+ |

Date |
Thu, 4 Sep 2008 12:04:47 -0400 |

Nick Cox <n.j.cox@durham.ac.uk> wrote: #My take differs from anybody else! From #what you say, this is not a #spike. It is just strong skewness. After results my coauthor sent me last night I am inclined to agree. He fit mixture models to some endpoint skewed DVs. The mixture always went to 0. We are plannng some sims to test this but the big problem is that the mixture of true endpoint and a bimodal beta is hard to distinguish. #A spike in my book is a big group of #identical values, in this context #usually lots of exact zeros or exact ones #(or 100%s, naturally). Interior spikes seem to be the real trouble, e.g., one on 0.5. #A good approximation is if that you take #logits of a beta-distributed #variable, the distribution looks bell- #shaped. That's true even for #highly skewed betas with modes near 0 #or near 1. Yes, so long as the distribution is not J- or L-shaped, which can happen with the beta. It can handle those shapes and endpoint bimodality too. #However, if you have any exact zeros or #ones, you can't take logits, and #equivalently you can't really fit a beta. #You need either a fudge that #denies that the zeros or ones really are #that or a mixture model such as #others are referring to. Right. The beta likelihhod is relatively insensitive to transformations that pull exact 0 or 1 observations into (0,1). I have gotten to the point I just do it using Y_new = 1/2n + (1 - 1/2n)*Y_old. But the choice of cheating factor is ultimately not very important thankfully. Also I should note that a historgram is a crummy tool for identifying spikes unless the sample size is very large and the spike is distinct. Try the ECDF or the frequency table. Nick [not Nic] n.j.cox@durham.ac.uk Dan Weitzenfeld I am trying to determine which testing factors drive a proportion dependent variable, PercentNoise. In searching the archives, I came across -betafit-, and a link to the FAQ: "How do you fit a model when the dependent variable is a proportion?" In that response, Allen McDowell and Nic Cox write, "In practice, it is often helpful to look at the frequency distribution: a marked spike at zero or one may well raise doubt about a single model fitted to all data." That describes my situation exactly: I have a marked spike in my histogram at the top bin, roughly .95 - 1. I am wondering how to account for this. Does -betafit- take such a possibility into account? Can someone briefly describe how I could use multiple models to fit all the data, as implied in the FAQ response? My fallback is setting a pass/fail bar and converting my proportions to a binary, then using probit/logit. But the obvious drawback is that I am throwing away information by collapsing the continuous (albeit bounded) proportion variable to a binary. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: reprogramming survwgt ?** - Next by Date:
**Re: st: RE: reprogramming survwgt ?** - Previous by thread:
**st: Treating multiple-variables as one** - Next by thread:
**FW: st: RE: Dependent var is a proportion, with large spike in .95+** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |