Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?

From   Maarten Buis <>
Subject   Re: st: OLS assumptions not met: transformation, gls, or glm as solutions?
Date   Mon, 17 Dec 2012 16:08:22 +0100

On Mon, Dec 17, 2012 at 3:29 PM, Laura R. wrote:
> @ Carlo: I conducted your example and with my data it seems the same,
> the -robust- option does not seem to change the graphical pictures or
> the tests (-estat hettest-,  -iqr-) much. So the robust option has to
> be visible in the graphics and the tests, that it induced
> homoskedasticity?

Robust standard errors only change the standard errors, they do not
change the other estimates. So most of the graphical checks should
remain unchanged.

> @ Nick:
> As to the equality of variances between the cases from the 2 surveys,
> a referee seems concerned about inferences one can make from the
> descriptive statistics. Therefore, I would like to use -sdtest- to see
> whether variances are the same in the two samples.

Descriptive statistics are just that: descriptive. So I don't see a
problem there as long as you interpret everything correctly.

> And for the regression, I think that adding the year-dummy would be
> enough to account for it?

No, that just moves the conditional mean up or down, but it does not
change the conditional variance.

> @ Maarten:
> So you would not worry about heteroskedasticity or the distribution of
> errors. What would you write in the paper then? "There is
> heteroskedasticity and non-normal error distribution, but I still use
> OLS because ...?" I am very curious, because I would like to keep the

I'd make sure I modeled the conditional mean as well as possible.
Typically I prefer using different link functions over transforming
the dependent variable, as that way you can keep the interpretation in
terms of the original metric. After that, robust standard errors take
care of the rest.

> @ Maarten & David:
> About linearity: as independent variables, I mainly have categorical
> variables. So - scatter y x- or -graph matrix y x x- does not help
> much, because the cases are only on the lines for 0 and 1. How can I
> see whether I have a linear relationship between y and x, if x is
> categorical?

With categorical variables you are just fine. In case of one
categorical variable you will just exactly reproduce the conditional
means, so your model is as good as it could possibly be. The same is
true with more categorical variable and all interactions. Too many
interactions means your model becomes hard to interpret, so it ceases
to do what it should be doing: simplify reality so we can understand
it. Linearity is still not an issue, but the absence of interactions
can be. That is a judgement call you will have to make.

There are two potential mistakes however: 1) you have a continuous
variable and in order to avoid the linearity assumption you split it
up into a categorical variable. That way you typically throw away too
much information. It is better to use something like splines (see:
-help mkspline-) to model the effect of such a variable. 2) you have a
categorical variable but forgot to use the -i.- prefix, i.e. you
treated it as continous, see: -help fvvarlist-.

> @ David:
> Yes, I think about transformation, and will read again about
> interpretation. Still, just having minutes to interpret would be
> easier, also for readers which are not so familiar with
> transformation. Also, I am not sure whether OLS with transformed
> dependent variable, or -glm- without transformed variable would be
> better.

It obviously depends on the concrete situation, but typically I tend
to prefer -glm- with different link functions as that way you keep the
metric of the original dependent variable (or in this case a linear
transformation of it).

Hope this helps,

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index