[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: zero inflated beta [was: st: Information request]

From	"Verkuilen, Jay" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: zero inflated beta [was: st: Information request]
Date	Mon, 17 Aug 2009 11:51:11 -0400

The ZI-Beta model is very hard to identify, too. The problem is that the beta includes a J-shaped distribution. It's hard to know if this will work.

The main issue that the original poster should consider is if having no bonus is qualitatively different than having some. If the same basic set of regressors predict bonus size, just "cheat" the values away a little by, say, linearly transforming all bonuses towards .5 by a small amount and use beta regression normally. 

If not, some kind of system approach becomes necessary and, well, that's going to get ugly. 

JV

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Maarten buis
Sent: Thursday, August 13, 2009 4:34 AM
To: [email protected]
Subject: zero inflated beta [was: st: Information request]

--- On Wed, 12/8/09, Fabio Zona wrote:
> I am in the unfortunate situation of running a regression
> analysis, whereby:
> - my dependent variable is a proportion (percentage of
> bonus on total compensation of top managers of 178
> corporations),
> - the majority (more than 50%) of my managers does not have
> any bonus, so the proportion is exact ZERO, that is, my
> dependent variable has many exact zeros.
> 
> How can I estimate this model? do you know the command I
> should use in Stata?
> I know that I cannot use the fractional logit because I
> have many zeros. I have not found any zero-inflated logistic
> regression for situations whereby y are proportion

A zero inflated fractional logit model is hard to identify. A
zero-inflated beta is probably better, but there is obviously
a price (there is no such thing as a free lunch...), and that
is more restrictive assumptions. 

Below is a quick stab at implementing such a model. I haven't 
done any checking or certification on it, so it is up to you 
to determine whether this is program actually does what it is 
supposed to do. As a first step I would build a simulated 
dataset where you know what the parameters should be and 
check whether this program actually finds those.

Hope this helps,
Maarten

*----------- begin example ---------------
clear
program drop _all
set more off
input      prop str1 site variety
 0.0005    A       1
 0.0000    A       2
 0.0000    A       3
 0.0010    A       4
 0.0025    A       5
 0.0005    A       6
 0.0050    A       7
 0.0130    A       8
 0.0150    A       9
 0.0150    A       10
 0.0000    B       1
 0.0005    B       2
 0.0005    B       3
 0.0030    B       4
 0.0075    B       5
 0.0030    B       6
 0.0300    B       7
 0.0750    B       8
 0.0100    B       9
 0.1270    B       10
 0.0125    C       1
 0.0125    C       2
 0.0250    C       3
 0.1660    C       4
 0.0250    C       5
 0.0250    C       6
 0.0000    C       7
 0.2000    C       8
 0.3750    C       9
 0.2625    C       10
 0.0250    D       1
 0.0050    D       2
 0.0001    D       3
 0.0300    D       4
 0.0250    D       5
 0.0001    D       6
 0.2500    D       7
 0.5500    D       8
 0.0500    D       9
 0.4000    D       10
 0.0550    E       1
 0.0100    E       2
 0.0600    E       3
 0.0110    E       4
 0.0250    E       5
 0.0800    E       6
 0.1650    E       7
 0.2950    E       8
 0.2000    E       9
 0.4350    E       10
 0.0100    F       1
 0.0500    F       2
 0.0500    F       3
 0.0500    F       4
 0.0500    F       5
 0.0500    F       6
 0.1000    F       7
 0.0500    F       8
 0.5000    F       9
 0.7500    F       10
 0.0500    G       1
 0.0010    G       2
 0.0500    G       3
 0.0500    G       4
 0.5000    G       5
 0.1000    G       6
 0.5000    G       7
 0.2500    G       8
 0.5000    G       9
 0.7500    G       10
 0.0500    H       1
 0.1000    H       2
 0.0500    H       3
 0.0500    H       4
 0.2500    H       5
 0.7500    H       6
 0.5000    H       7
 0.7500    H       8
 0.7500    H       9
 0.7500    H       10
 0.1750    I       1
 0.2500    I       2
 0.4250    I       3
 0.5000    I       4
 0.3750    I       5
 0.9500    I       6
 0.6250    I       7
 0.9500    I       8
 0.9500    I       9
 0.9500    I       10
end

encode site, gen(sitenum)
gen byte left = sitenum <= 4

program define zibeta_lf
	*! MLB 0.0.1 13 Aug 2009
	version 8.2
	args lnf logitmu lnphi zb
	tempvar zero nonzero mu phi

	quietly gen double `zero' = invlogit(`zb')
	quietly gen double `nonzero' = invlogit(-`zb')
	quietly gen double `mu' = invlogit(`logitmu')
	quietly gen double `phi' = exp(`lnphi')

	quietly replace `lnf' =  ln(`nonzero') +    ///
	                         lngamma(`phi') - ///
                               lngamma(`mu'*`phi') - ///
                               lngamma((1-`mu')*`phi') + ///
                               (`mu'*`phi'-1)*ln($ML_y) + ///
                               ((1-`mu')*`phi'-1)*ln(1-$ML_y) ///
                               if $ML_y > 0

	quietly replace `lnf' =  ln(`zero') if $ML_y == 0
end
xi i.site i.variety
ml model lf zibeta_lf (logitmu: prop = _I*) /lnphi (zg:left), robust
ml check
ml search
ml maximize

exit
*--------------- end example ----------------------



-----------------------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://home.fsw.vu.nl/m.buis/
-----------------------------------------





      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: zero inflated beta [was: st: Information request]
  - From: Austin Nichols <[email protected]>

References:
- st: Information request
  - From: Fabio Zona <[email protected]>
- zero inflated beta [was: st: Information request]
  - From: Maarten buis <[email protected]>

Prev by Date: AW: st: local containing (all) variables' names in a dataset
Next by Date: st: limiting rows in tabulate
Previous by thread: RE: st: Combining propensity score matching with difference in differences
Next by thread: Re: zero inflated beta [was: st: Information request]
Index(es):
- Date
- Thread