Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Laurie Molina <molinalaurie@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Choosing a family using glm |

Date |
Tue, 24 Aug 2010 17:28:22 -0500 |

Thank you very much phil, i will work on that! On Tue, Aug 24, 2010 at 4:14 PM, Phil Schumm <pschumm@uchicago.edu> wrote: > On Aug 24, 2010, at 12:10 PM, Laurie Molina wrote: >> >> I'm trying to fit a glm to get non negative fitted values. I am thinking >> to use a glm with a log link. But i am not sure about wich family to use. >> Is there any test i can perform to choose between the normal and gamma >> distribution? > > > Everything Nick said is correct, of course -- I'll just expand a bit. WRT > the distributional family, what is most important is that the variance > function of the family (i.e., the way in which the variance changes WRT the > mean) is consistent with your data. For example, the variance function for > the Normal distribution is V(mu) = 1 (where mu is E(Y) or the mean of Y), > which corresponds to constant variance (i.e., this is why you look for > homoscedasticity in residual plots after classical linear regression). In > contrast, the variance function for the gamma distribution is V(mu) = mu^2, > which means that the variance increases with the square of the mean (i.e., > constant coefficient of variation). The easiest (and in any case > indispensable) way to check if your variance function is plausible is to > plot the standardized residuals versus the fitted values and verify that the > amount of variation appears constant; in some cases it might be helpful to > examine a plot of the absolute residuals versus the fitted values, together > with the aid of -lowess-. > > >> My data is for the rent price of houses, so it is not count data and >> therefore i think i should not use poisson. > > > Again, what's important is that you select a family whose variance function > is consistent with your data. For more information, see the book > Generalized Linear Models by McCullagh and Nelder. > > >> To my understend in a clasical linear regression the asumption of >> normality is in the distribution of the error term, but in glm the asumption >> defined by the family selection is on the distribution of the dependent >> variable. Isnt that a huge cost for using glm instead of a clasical linear >> regression model? > > > You are laboring under a misunderstanding. To say that the distribution of > Y conditional on X is Normal with mean XB and variance sigma^2 is the same > as saying that the distribution of the errors (i.e., Y - XB) is Normal with > mean 0 and variance sigma^2. And to emphasize the GLM approach, what is > most important (if you're fitting a linear regression) is that the mean is > XB and the variance is constant (i.e., that your assumptions about the first > and second moments are correct). > > > -- Phil > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Choosing a family using glm***From:*Laurie Molina <molinalaurie@gmail.com>

**Re: st: Choosing a family using glm***From:*Phil Schumm <pschumm@uchicago.edu>

- Prev by Date:
**Re: st: AW: Multiple Imputation on Panel Data: all variables have missing data, and the panels are expanding** - Next by Date:
**Re: st: Choosing a family using glm** - Previous by thread:
**Re: st: Choosing a family using glm** - Next by thread:
**Re: st: Choosing a family using glm** - Index(es):