Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
carlo.lazzaro@tiscalinet.it |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: R: OLS assumptions not met: transformation, gls, or glm as solutions? |

Date |
Mon, 17 Dec 2012 12:09:46 +0100 |

The first Laura's query is: 1. Keep the model and the variables as they are (but maybe use robust standard errors) - is this possible under certain conditions, even if I have heteroskedasticity and non-normality of residuals, and when is this justified? Using robust standard errors will not always shelter you for heteroskedasticity, as you can see from the following (misspecified) example: ------------------------------------------ sysuse auto.dta reg price mpg weight estat hettest reg price mpg weight, robust predict res, residuals qnorm res, grid --------------------------------------- Besides, after invoking -reg y x, robust - Stata rejects (for methodological reasons) -estat hettest-. Hence, a graphical test (always advisable, anyway) is a helpful way to go. Best Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Laura R. Inviato: lunedì 17 dicembre 2012 11:44 A: statalist@hsphsun2.harvard.edu Oggetto: st: OLS assumptions not met: transformation, gls, or glm as solutions? Dear Stata users, I estimated an OLS model with the number of minutes (1-1440) spent on an activity on a day as dependent variable. At first sight, the model works fine. I receive some interesting results which are robust across model specifications. I would like to keep it as it is, but: - The regression diagnostics shows that the error terms are not normally distributed, but right skewed. - In addition, there is heteroskedasticity. Excluding outliers and influential cases does not help. Now I can think about 4 solutions, but I am not sure when it is justified to decide on one of these: 1. Keep the model and the variables as they are (but maybe use robust standard errors) - is this possible under certain conditions, even if I have heteroskedasticity and non-normality of residuals, and when is this justified? 2. Transform the dependent variable. If I take the ln of the dependent variable, the residuals get closer to a normal distribution, and it gets closer to homoskedasticity. But then there is the problem of interpreting the results. 3. Generalised least square model (gls): Use this instead. This is a solution to heteroskedasticity, but do the residuals have to be normally distributed in gls as well? What other new assumptions of gls might cause new problems (pros/cons gls vs. OLS)? And how can I do this in Stata? (Somehow with calculating a weight, I think...) 4. Generalised linear model (glm): In some sources I read that this also accounts for heteroskedasticity, in other sources not. Again, what about the normal distribution of residuals here? I heard that glm is better than OLS for non-negative dependent variables, is that correct? What are other assumptions of gls that could make me still prefer OLS? If I used it ,and if my dependent variable is non-negative, and residuals are right skewed, do I have to "tell" that Stata when estimating the model, or can I run it as it is? (I quickly ran -glm- already, without any special specifications, and the results are the same as from the OLS model.) In sum, I need some decision-making support. What is the best thing to do in this case? One thing that would help is a comparison of assumptions of OLS, gls, glm. I am aware of the assumptions of OLS models, but for gls and glm I did not find comprehensive lists and explanations. It would be great if you could give me hints on what would be a good solution. Maybe you know a source explaining when to use which solution if OLS assumptions of normality and homoskedasticity are not met. Laura PS: I am aware of the fact that many used Tobit for similar dependent variables, including the zeros. My case is different, and for some reason I do not want to do this, and I excluded the zeros. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: OLS assumptions not met: transformation, gls, or glm as solutions?***From:*"Laura R." <laura.roh@googlemail.com>

- Prev by Date:
**st: heteroskedasticity between groups, interpretation of -sdtest-** - Next by Date:
**Re: st: heteroskedasticity between groups, interpretation of -sdtest-** - Previous by thread:
**st: OLS assumptions not met: transformation, gls, or glm as solutions?** - Next by thread:
**Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?** - Index(es):