Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?


From   David Hoaglin <dchoaglin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?
Date   Tue, 18 Dec 2012 17:49:26 -0500

> @ Maarten & David:
> About linearity: as independent variables, I mainly have categorical
> variables. So - scatter y x- or -graph matrix y x x- does not help
> much, because the cases are only on the lines for 0 and 1. How can I
> see whether I have a linear relationship between y and x, if x is
> categorical?

If the predictors are categorical, the focus in a discussion of
transformations shifts to promoting additivity of the contributions of
those predictors.  Ideally, a model will have main effects for those
predictors and no interactions among them.

If the categorical predictors have more than 2 categories, it is
easier to derive information from the data that helps in choosing a
transformation that removes or reduces the nonadditivity.

> @ David:
> Yes, I think about transformation, and will read again about
> interpretation. Still, just having minutes to interpret would be
> easier, also for readers which are not so familiar with
> transformation. Also, I am not sure whether OLS with transformed
> dependent variable, or -glm- without transformed variable would be
> better.

As others have suggested, in a GLM a suitable choice of link function
may allow you to avoid transforming the dependent variable.  But the
link function simply relates the conditional expectation of the
dependent variable to the linear component of the model.  The random
component of the model handles the other features of the conditional
distribution.  If the random component of the data (which will show up
in the residuals) is skewed, a choice other than the normal
distribution is indicated.  The choice of link function would focus on
the structure in the data (perhaps additive).

David Hoaglin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index