Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Model for Poisson-shaped distribution but with non-count data


From   "William Gould, StataCorp LP" <wgould@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Model for Poisson-shaped distribution but with non-count data
Date   Tue, 06 Dec 2011 11:34:13 -0600

David Hoaglin <dchoaglin@gmail.com>, in reference to the blog entry
"Use Poisson Rather Than Regress, Tell a Friend" at

    http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/

  wrote, 

> [..] I did not see any
> mention of the fact that the Poisson distribution is discrete.  In the
> limit (as the mean of the distribution becomes large), that matters
> less, but one would need to view the possible data values as discrete.
> 
> Some of the equations in the blog are not quite correct.  For example,
> since Poisson regression is a form of generalized linear model, the
> linear predictor is fitted to log(E(y)), rather than to log(y).  The
> random component of the GLM is a Poisson distribution.

I'm concerned that someone might interpret what David wrote to mean 

    1.  There may be practical problems using -poisson- to run 
        log-linear regressions, depending on whether the LHS variable 
        contains noninteger values.

    2.  There may be theoretical problems using -poisson- to run 
        log-linear regressions.

Neither would be true.  My short-and-quick response is, 

    1.  -poisson- can handle non-discrete (non-integer) data values.
        Left-hand-side values do not have to be large to ammelorate any 
        problem.  

    2.  The formulas in the blog are as intended and are correct. 

Let me explain.


Concerning #1, -poisson- does not round values when run on noninteger
data.  Instead, it gives the warning message "you are responsible for
interpreation of noncount dep. variable."

An implication of that is that the objective function with non-integer
data may not be a true likelihood function.  Actually, I suspect that
it is, but that's irrelevant because we in the blog entry are doing M
estimation and I recommended you obtain standard errors using the
-vce(robust)- option.

When -poisson- calculates the likelihood value associated with a
noninteger value, it does that using the standard formulas, but
substituting the Gamma function for factorial function.  That is
appropriate for M estimation.

This generalization means that you can run -poisson- using a LHS
variable with noninteger values and there will be no problems.  All
the values, in fact, can even be less than 1!  Whether you run on y,
y/10, y/100, y/1000, ..., all that will change will be the intercept.


Concerning #2, the formulas written in the blog entry imply that
log(E(y)) = a + b*X.  It is true that I did write

      y = exp(a + b*X + e)

and that implies more than merely log(E(y)) = a + b*X.  I did that 
because I was starting with the log-linear regression problem.  
The purpose of the blog entry is to show that -poisson- could be 
used as an alternative to linear regression on the ln(y).

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index