Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: R: OLS assumptions not met: transformation, gls, or glm as solutions?


From   carlo.lazzaro@tiscalinet.it
To   <statalist@hsphsun2.harvard.edu>
Subject   st: R: OLS assumptions not met: transformation, gls, or glm as solutions?
Date   Mon, 17 Dec 2012 12:09:46 +0100

The first Laura's query is: 
1. Keep the model and the variables as they are (but maybe use robust
standard errors) - is this possible under certain conditions, even if I have
heteroskedasticity and non-normality of residuals, and when is this
justified?

Using robust standard errors will not always shelter you for
heteroskedasticity, as you can see from the following (misspecified)
example:
------------------------------------------
sysuse auto.dta
reg price mpg weight
estat hettest
reg price mpg weight, robust
predict res, residuals
qnorm res, grid
---------------------------------------
Besides, after invoking -reg y x, robust - Stata rejects (for methodological
reasons) -estat hettest-. Hence, a graphical test (always advisable, anyway)
is a helpful way to go.

Best Regards,
Carlo
-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Laura R.
Inviato: lunedì 17 dicembre 2012 11:44
A: statalist@hsphsun2.harvard.edu
Oggetto: st: OLS assumptions not met: transformation, gls, or glm as
solutions?

Dear Stata users,

I estimated an OLS model with the number of minutes (1-1440) spent on an
activity on a day as dependent variable. At first sight, the model works
fine. I receive some interesting results which are robust across model
specifications. I would like to keep it as it is, but:

- The regression diagnostics shows that the error terms are not normally
distributed, but right skewed.

- In addition, there is heteroskedasticity.

Excluding outliers and influential cases does not help. Now I can think
about 4 solutions, but I am not sure when it is justified to decide on one
of these:

1. Keep the model and the variables as they are (but maybe use robust
standard errors) - is this possible under certain conditions, even if I have
heteroskedasticity and non-normality of residuals, and when is this
justified?

2. Transform the dependent variable. If I take the ln of the dependent
variable, the residuals get closer to a normal distribution, and it gets
closer to homoskedasticity. But then there is the problem of interpreting
the results.

3. Generalised least square model (gls): Use this instead. This is a
solution to heteroskedasticity, but do the residuals have to be normally
distributed in gls as well? What other new assumptions of gls might cause
new problems (pros/cons gls vs. OLS)? And how can I do this in Stata?
(Somehow with calculating a weight, I think...)

4. Generalised linear model (glm): In some sources I read that this also
accounts for heteroskedasticity, in other sources not. Again, what about the
normal distribution of residuals here? I heard that glm is better than OLS
for non-negative dependent variables, is that correct? What are other
assumptions of gls that could make me still prefer OLS? If I used it ,and if
my dependent variable is non-negative, and residuals are right skewed, do I
have to "tell" that Stata when estimating the model, or can I run it as it
is?

(I quickly ran -glm- already, without any special specifications, and the
results are the same as from the OLS model.)

In sum, I need some decision-making support. What is the best thing to do in
this case?
One thing that would help is a comparison of assumptions of OLS, gls, glm. I
am aware of the assumptions of OLS models, but for gls and glm I did not
find comprehensive lists and explanations.

It would be great if you could give me hints on what would be a good
solution. Maybe you know a source explaining when to use which solution if
OLS assumptions of normality and homoskedasticity are not met.

Laura



PS: I am aware of the fact that many used Tobit for similar dependent
variables, including the zeros. My case is different, and for some reason I
do not want to do this, and I excluded the zeros.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index