Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Log transform and GLM


From   Jhilbe@aol.com
From   Charles Goss <chudge@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: RE: Log transform and GLM
Subject   st: Log Transformation and GLM
Date   Mon, 16 Jan 2006 13:37:54 EST
Date   Sun, 15 Jan 2006 23:28:31 -0500

Hello Statalist,

I am having some issues analyzing data with  glm.  I have tried several
methods to analyze my zero-inflated data set  (zinb, hurdle and glm). 
The best model fit that I get are when I log  transform the response
variable prior to analysis with a glm model using a  negative binomial
distribution.  The negative binomial uses a log link  function, so I
think that this analysis is essentially double  log-transforming the
data, once initially, and then when the response is  linked to the
predictors it is log-transformed again.  I have not been  able to find
any literature regarding this, so I was wondering if anyone  knows if
this is an appropriate way to analyze these data?  Does it  violate
assumptions of the glm??  Thanks for your  time.

Chuck

======================
Chuck:

Think about it  this way. Poisson and negative binomial (NB) are count 
response models. One can  use them with decimals, e.g. 15.4, etc, but essentially 
the assumption upon  which the models are based is that they model counts, or 
integers. By log  transforming the counts you have seriously compromised the 
assumptions. Look at  the range of  your response? 

You are also correct in thinking that  you have logged a response that has 
already been logged internally from with the  algorithm. It's a bit more 
complicated than that, but you should not do it.  

It appears from your comments that there are excessive zeros in the  
response. Either a hurdle or ZINB is probably the best approach -- if you are  still 
intending to model counts. It just may be that neither of these models fit  the 
data well. Do you know the reason why there are excessive 0's. Try a  
2-parameter log-gamma or 2-parameter log-inverse Gaussian model. Compare the AIC  
statistics. You can also try severing the data by excluding 0's and model using  
a 0-runcated program -- but only if you know that the 0's data have been  
generated by an entirely differeent method than the positive count data. This is  
not an ideal solution, but a possible one in certain circumstances. 

Joe  Hilbe  

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index