Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Choosing a family using glm

From   Phil Schumm <>
Subject   Re: st: Choosing a family using glm
Date   Tue, 24 Aug 2010 16:14:06 -0500

On Aug 24, 2010, at 12:10 PM, Laurie Molina wrote:
I'm trying to fit a glm to get non negative fitted values. I am thinking to use a glm with a log link. But i am not sure about wich family to use. Is there any test i can perform to choose between the normal and gamma distribution?

Everything Nick said is correct, of course -- I'll just expand a bit. WRT the distributional family, what is most important is that the variance function of the family (i.e., the way in which the variance changes WRT the mean) is consistent with your data. For example, the variance function for the Normal distribution is V(mu) = 1 (where mu is E(Y) or the mean of Y), which corresponds to constant variance (i.e., this is why you look for homoscedasticity in residual plots after classical linear regression). In contrast, the variance function for the gamma distribution is V(mu) = mu^2, which means that the variance increases with the square of the mean (i.e., constant coefficient of variation). The easiest (and in any case indispensable) way to check if your variance function is plausible is to plot the standardized residuals versus the fitted values and verify that the amount of variation appears constant; in some cases it might be helpful to examine a plot of the absolute residuals versus the fitted values, together with the aid of -lowess-.

My data is for the rent price of houses, so it is not count data and therefore i think i should not use poisson.

Again, what's important is that you select a family whose variance function is consistent with your data. For more information, see the book Generalized Linear Models by McCullagh and Nelder.

To my understend in a clasical linear regression the asumption of normality is in the distribution of the error term, but in glm the asumption defined by the family selection is on the distribution of the dependent variable. Isnt that a huge cost for using glm instead of a clasical linear regression model?

You are laboring under a misunderstanding. To say that the distribution of Y conditional on X is Normal with mean XB and variance sigma^2 is the same as saying that the distribution of the errors (i.e., Y - XB) is Normal with mean 0 and variance sigma^2. And to emphasize the GLM approach, what is most important (if you're fitting a linear regression) is that the mean is XB and the variance is constant (i.e., that your assumptions about the first and second moments are correct).

-- Phil

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index