Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Choosing a family using glm


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Choosing a family using glm
Date   Tue, 24 Aug 2010 18:28:03 +0100

I don't think you should think in terms of a single test. That would be as naïve for this problem as for many others. 

0. Sometimes one or other approach just won't converge, so that's a sign. Conversely, there are datasets in which model fits are very similar. 

1. Look at the -glm- output, including the z's, the p's and the log likelihood. 

2. Plot residuals vs fitted and observed vs fitted for each family. Plot fitted for normal versus fitted for gamma and see how much difference family choice it makes. 

3. Examine whether predictions make scientific sense for cases of most interest. 

4. Why feel obliged to choose one model? Perhaps two models together tell you something. 

Your last question appears to be based on a confusion. -glm- is one kind of generalisation of linear regression. Like regression, the central focus is the distribution of the response variable conditional on the predictors, not the marginal distribution of the response. Also, using link functions removes much of the adhockery necessitated by transformations. So, the short answer is emphatically No.

Nick 
n.j.cox@durham.ac.uk 

Laurie Molina

I'm trying to fit a glm to get non negative fitted values.
I am thinking to use a glm with a log link.
But i am  not sure about wich family to use.

Is there any test i can perform to choose between the normal and gamma
distribution?

My data is for the rent price of houses, so it is not count data and
therefore i think i should not use poisson.

And just one more question:

To my understend in a clasical linear regression the asumption of
normality is in the distribution of the error term, but in glm the
asumption defined by the family selection is on the distribution of
the dependent variable. Isnt that a huge cost for using glm instead of
a clasical linear regression model?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index