# st: RE: Log transform and GLM

 From Jhilbe@aol.com From Charles Goss To statalist@hsphsun2.harvard.edu Subject st: RE: Log transform and GLM Subject st: Log Transformation and GLM Date Mon, 16 Jan 2006 13:37:54 EST Date Sun, 15 Jan 2006 23:28:31 -0500

```Hello Statalist,

I am having some issues analyzing data with  glm.  I have tried several
methods to analyze my zero-inflated data set  (zinb, hurdle and glm).
The best model fit that I get are when I log  transform the response
variable prior to analysis with a glm model using a  negative binomial
distribution.  The negative binomial uses a log link  function, so I
think that this analysis is essentially double  log-transforming the
data, once initially, and then when the response is  linked to the
predictors it is log-transformed again.  I have not been  able to find
any literature regarding this, so I was wondering if anyone  knows if
this is an appropriate way to analyze these data?  Does it  violate
assumptions of the glm??  Thanks for your  time.

Chuck

======================
Chuck:

Think about it  this way. Poisson and negative binomial (NB) are count
response models. One can  use them with decimals, e.g. 15.4, etc, but essentially
the assumption upon  which the models are based is that they model counts, or
integers. By log  transforming the counts you have seriously compromised the
assumptions. Look at  the range of  your response?

You are also correct in thinking that  you have logged a response that has
already been logged internally from with the  algorithm. It's a bit more
complicated than that, but you should not do it.

It appears from your comments that there are excessive zeros in the
response. Either a hurdle or ZINB is probably the best approach -- if you are  still
intending to model counts. It just may be that neither of these models fit  the
data well. Do you know the reason why there are excessive 0's. Try a
2-parameter log-gamma or 2-parameter log-inverse Gaussian model. Compare the AIC
statistics. You can also try severing the data by excluding 0's and model using
a 0-runcated program -- but only if you know that the 0's data have been
generated by an entirely differeent method than the positive count data. This is
not an ideal solution, but a possible one in certain circumstances.

Joe  Hilbe

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```