Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Modeling an independent variable with a very high data density at x=0


From   Allan Garland <agar5858@shaw.ca>
To   statalist@hsphsun2.harvard.edu
Subject   st: Modeling an independent variable with a very high data density at x=0
Date   Fri, 05 Jun 2009 19:40:58 -0700

I'm doing a logistic regression using a non-negative, continuous independent variable X, for which about 60% of cases have X=0.  It seems to me that just including X in the model is problematic, since it is likely that many cases with Y=0 and many others with Y=1 will have X=0.  I can think of 2 possible approaches to modeling X, but would like some feedback on them, and any other thoughts on how to handle this situation.
 a) Divide X into m categories and represent it with m-1 dummy variables in the model.
b) Include X in the model, and also include a binary variable Z such that Z=1 when X=0 and Z=0 otherwise.  Then the effect of X=0 is given by the coefficient of Z, and the effect of X>0 is purely given by the
coefficient of X itself (since then Z=0).

Allan

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index