Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: OLS assumptions not met: transformation, gls, or glm as solutions? |

Date |
Mon, 17 Dec 2012 07:33:13 -0500 |

Laura, When you plotted the dependent variable against the predictor variables, what patterns of curvature (if any) did you see? You didn't mention the number of observations. If it is large, you my want to use LOWESS to trace smooth curves through those plots. You can also look for curvature in the plots of the studentized residuals against the individual predictor variables, and a plot of those residuals against the predicted values will give you information on the pattern of heteroskedasticity. Often, transforming the dependent variable helps to straighten the relations between the dependent variable and the predictors, AND it also stabilizes the variability in the dependent variable. It is likely that the variability in the number of minutes spent on the activity increases as the expected number of minutes increases. Two other transformations to consider are the square root and the reciprocal. (If your data were time to complete a task, the reciprocal would transform slowness into fastness.) If the logarithm is the most reasonable choice, it is not necessary to make interpretation more difficult by using the natural log. Use logs base 10 instead. With either base, interpretation is in terms of ratios, which is often not difficult. After a suitable transformation you may have fewer outliers (or none). You should be cautious in excluding outliers and, especially, influential observations. If you included the zeros and used a tobit model, you would still have to do something about curvature and heteroskedasticity. David Hoaglin On Mon, Dec 17, 2012 at 5:43 AM, Laura R. <laura.roh@googlemail.com> wrote: > Dear Stata users, > > I estimated an OLS model with the number of minutes (1-1440) spent on > an activity on a day as dependent variable. At first sight, the model > works fine. I receive some interesting results which are robust across > model specifications. I would like to keep it as it is, but: > > - The regression diagnostics shows that the error terms are not > normally distributed, but right skewed. > > - In addition, there is heteroskedasticity. > > Excluding outliers and influential cases does not help. Now I can > think about 4 solutions, but I am not sure when it is justified to > decide on one of these: > > 1. Keep the model and the variables as they are (but maybe use robust > standard errors) - is this possible under certain conditions, even if > I have heteroskedasticity and non-normality of residuals, and when is > this justified? > > 2. Transform the dependent variable. If I take the ln of the dependent > variable, the residuals get closer to a normal distribution, and it > gets closer to homoskedasticity. But then there is the problem of > interpreting the results. > > 3. Generalised least square model (gls): Use this instead. This is a > solution to heteroskedasticity, but do the residuals have to be > normally distributed in gls as well? What other new assumptions of gls > might cause new problems (pros/cons gls vs. OLS)? And how can I do > this in Stata? (Somehow with calculating a weight, I think...) > > 4. Generalised linear model (glm): In some sources I read that this > also accounts for heteroskedasticity, in other sources not. Again, > what about the normal distribution of residuals here? I heard that glm > is better than OLS for non-negative dependent variables, is that > correct? What are other assumptions of gls that could make me still > prefer OLS? If I used it ,and if my dependent variable is > non-negative, and residuals are right skewed, do I have to "tell" that > Stata when estimating the model, or can I run it as it is? > > (I quickly ran -glm- already, without any special specifications, and > the results are the same as from the OLS model.) > > In sum, I need some decision-making support. What is the best thing to > do in this case? > One thing that would help is a comparison of assumptions of OLS, gls, > glm. I am aware of the assumptions of OLS models, but for gls and glm > I did not find comprehensive lists and explanations. > > It would be great if you could give me hints on what would be a good > solution. Maybe you know a source explaining when to use which > solution if OLS assumptions of normality and homoskedasticity are not > met. > > Laura > > > > PS: I am aware of the fact that many used Tobit for similar dependent > variables, including the zeros. My case is different, and for some > reason I do not want to do this, and I excluded the zeros. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?***From:*"Laura R." <laura.roh@googlemail.com>

**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?***From:*"JVerkuilen (Gmail)" <jvverkuilen@gmail.com>

**References**:**st: OLS assumptions not met: transformation, gls, or glm as solutions?***From:*"Laura R." <laura.roh@googlemail.com>

- Prev by Date:
**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?** - Next by Date:
**st: __000000 not found** - Previous by thread:
**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?** - Next by thread:
**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?** - Index(es):