[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Dependent var is a proportion, with large spike in .95+

From   "Verkuilen, Jay" <[email protected]>
To   <[email protected]>
Subject   st: RE: Dependent var is a proportion, with large spike in .95+
Date   Wed, 3 Sep 2008 17:43:25 -0400

Dan Weitzenfeld wrote:

>>That describes my situation exactly:  I have a marked spike in my
histogram at the top bin, roughly .95 - 1.  I am wondering how to
account for this.>>

I am working on a model that combines zero inflation and a beta
regression, essentially a combination of a beta regression for the
continuous part and a logistic (or probit) for the boundary. It's not
done in Stata (yet... but don't hold your breath). So far we've found it
to be fairly tricky to implement--as zero inflation models tend to
be---but it does work.

Also, depending on the nature of your DV, there is little harm in
"cheating" your observations away from 0 by using the transformation:

          Y_new = eps/2 + (1 - eps/2)*Y_old

where eps > 0 is a small constant, e.g., .001. The beta likelihood is
relatively insensitive to such perturbations (while other likelihoods
are not). 

IMO, the real question is the nature of the zeros, as a recent post by
Nick Cox makes plain. If the zero is a "real" one and means that there's
something qualitatively different than something slightly less than 0
then you need an inflation model. If not, cheating often works. 

Whoops on rereading I see you have a sampling one. Well, same idea. 


*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index