Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Model for Poisson-shaped distribution but with non-count data

 From "William Gould, StataCorp LP" To statalist@hsphsun2.harvard.edu Subject Re: st: Model for Poisson-shaped distribution but with non-count data Date Tue, 06 Dec 2011 11:34:13 -0600

```David Hoaglin <dchoaglin@gmail.com>, in reference to the blog entry
"Use Poisson Rather Than Regress, Tell a Friend" at

http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/

wrote,

> [..] I did not see any
> mention of the fact that the Poisson distribution is discrete.  In the
> limit (as the mean of the distribution becomes large), that matters
> less, but one would need to view the possible data values as discrete.
>
> Some of the equations in the blog are not quite correct.  For example,
> since Poisson regression is a form of generalized linear model, the
> linear predictor is fitted to log(E(y)), rather than to log(y).  The
> random component of the GLM is a Poisson distribution.

I'm concerned that someone might interpret what David wrote to mean

1.  There may be practical problems using -poisson- to run
log-linear regressions, depending on whether the LHS variable
contains noninteger values.

2.  There may be theoretical problems using -poisson- to run
log-linear regressions.

Neither would be true.  My short-and-quick response is,

1.  -poisson- can handle non-discrete (non-integer) data values.
Left-hand-side values do not have to be large to ammelorate any
problem.

2.  The formulas in the blog are as intended and are correct.

Let me explain.

Concerning #1, -poisson- does not round values when run on noninteger
data.  Instead, it gives the warning message "you are responsible for
interpreation of noncount dep. variable."

An implication of that is that the objective function with non-integer
data may not be a true likelihood function.  Actually, I suspect that
it is, but that's irrelevant because we in the blog entry are doing M
estimation and I recommended you obtain standard errors using the
-vce(robust)- option.

When -poisson- calculates the likelihood value associated with a
noninteger value, it does that using the standard formulas, but
substituting the Gamma function for factorial function.  That is
appropriate for M estimation.

This generalization means that you can run -poisson- using a LHS
variable with noninteger values and there will be no problems.  All
the values, in fact, can even be less than 1!  Whether you run on y,
y/10, y/100, y/1000, ..., all that will change will be the intercept.

Concerning #2, the formulas written in the blog entry imply that
log(E(y)) = a + b*X.  It is true that I did write

y = exp(a + b*X + e)

and that implies more than merely log(E(y)) = a + b*X.  I did that
because I was starting with the log-linear regression problem.
The purpose of the blog entry is to show that -poisson- could be
used as an alternative to linear regression on the ln(y).

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```