Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: OLS assumptions not met: transformation, gls, or glm as solutions?

 From "Laura R." To statalist@hsphsun2.harvard.edu Subject st: OLS assumptions not met: transformation, gls, or glm as solutions? Date Mon, 17 Dec 2012 11:43:56 +0100

```Dear Stata users,

I estimated an OLS model with the number of minutes (1-1440) spent on
an activity on a day as dependent variable. At first sight, the model
works fine. I receive some interesting results which are robust across
model specifications. I would like to keep it as it is, but:

- The regression diagnostics shows that the error terms are not
normally distributed, but right skewed.

- In addition, there is heteroskedasticity.

Excluding outliers and influential cases does not help. Now I can
think about 4 solutions, but I am not sure when it is justified to
decide on one of these:

1. Keep the model and the variables as they are (but maybe use robust
standard errors) - is this possible under certain conditions, even if
I have heteroskedasticity and non-normality of residuals, and when is
this justified?

2. Transform the dependent variable. If I take the ln of the dependent
variable, the residuals get closer to a normal distribution, and it
gets closer to homoskedasticity. But then there is the problem of
interpreting the results.

3. Generalised least square model (gls): Use this instead. This is a
solution to heteroskedasticity, but do the residuals have to be
normally distributed in gls as well? What other new assumptions of gls
might cause new problems (pros/cons gls vs. OLS)? And how can I do
this in Stata? (Somehow with calculating a weight, I think...)

4. Generalised linear model (glm): In some sources I read that this
also accounts for heteroskedasticity, in other sources not. Again,
what about the normal distribution of residuals here? I heard that glm
is better than OLS for non-negative dependent variables, is that
correct? What are other assumptions of gls that could make me still
prefer OLS? If I used it ,and if my dependent variable is
non-negative, and residuals are right skewed, do I have to "tell" that
Stata when estimating the model, or can I run it as it is?

(I quickly ran -glm- already, without any special specifications, and
the results are the same as from the OLS model.)

In sum, I need some decision-making support. What is the best thing to
do in this case?
One thing that would help is a comparison of assumptions of OLS, gls,
glm. I am aware of the assumptions of OLS models, but for gls and glm
I did not find comprehensive lists and explanations.

It would be great if you could give me hints on what would be a good
solution. Maybe you know a source explaining when to use which
solution if OLS assumptions of normality and homoskedasticity are not
met.

Laura

PS: I am aware of the fact that many used Tobit for similar dependent
variables, including the zeros. My case is different, and for some
reason I do not want to do this, and I excluded the zeros.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```