Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: glm model syntax


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: glm model syntax
Date   Sun, 12 Feb 2012 10:19:06 +0000

Without specifying otherwise you are assumed to be using Stata 12.1,
in which case using factor variable notation is recommended rather
than -xi:-. That is a small detail, however.

You seem confused about GLMs. You need to do more than read the help,
but to work your way through some thorough introductory account.

Any assumptions about the distribution of the dependent variable (I
much prefer "response" or "outcome" for reasons often discussed) are
about its conditional distribution given the predictors. -ladder- can
not help with this unless you are working with residuals. It examines
the marginal distribution. That is of some relevance but only
indirectly.

It is unlikely to be the case that both a variable and its reciprocal
are approximately normally distributed. The reciprocal is a strong
transformation and although it can conceivably return the same (kind
of) distribution as the original this is unusual. Quantitative
thinking is essential here, i.e. it is/is not normal is qualitative
thinking. What are the skewness and kurtosis (or similar shape
measures)?

Much of the point of GLMs is that the link function replaces
transformation of the predictor. You do not use a link function _and_
the same transformation.

Model 3 uses no transformation while model 2 uses a reciprocal
transformation and the reciprocal link. But the reciprocal link
(approximately) reverses the transformation as the reciprocal of a
reciprocal is the original. Thus what you did is similar to showing
that 1/(1/y) is just y. Similar, but not identical, as using a
particular function as link and the same function as response
transformation are not exactly equivalent, but often the results will
be close, as you indicate.

Despite all this, the major part of assessment of a  GLM is not of the
marginal distribution but whether how closely the response is modelled
by predictions and whether the model makes sense. Judgements here
often need to be subtle and to mix statistical and scientific
judgement. For example with

sysuse auto
glm mpg weight
glm mpg weight, link(power -1)

The linear model looks good by most standards but a reciprocal link
function makes scientific sense (the implicit scale is gallons per
mile). A transformation would also make sense but -glm- offers the
possibility of considering other distribution familes too.

See also

SJ-4-4  gr0009  . . . . . . . . . . Speaking Stata: Graphing model diagnostics
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        (help anovaplot, indexplot, modeldiag, ofrtplot, ovfplot,
        qfrplot, racplot, rdplot, regplot, rhetplot, rvfplot2,
        rvlrplot, rvpplot2 if installed)
        Q4/04   SJ 4(4):449--475
        plotting diagnostic information calculated from residuals
        and fitted values from regression models with continuous
        responses

for a detailed example in which -glm- fit was also assessed
graphically. (The package was updated in SJ 10-1.)

In terms of your model, you can't rely on -ladder- results to indicate
the best link. You need to consider various possibilities and also
whether using a different distribution family other than normal makes
sense. But your comparisons should start with

xi:glm dep_var i.var1 i.var2 var3 var4

xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1)

Nick

On Sun, Feb 12, 2012 at 9:01 AM, Nikolaos Pandis <npandis@yahoo.com> wrote:

> I have a continuous dependent variable and 4 predictors.
>
> The depedent variable is aproximately normally distributed, however, using the -ladder- command it is indicated that if I use the inverse (1/dep_var) my data will approximate normal distribution better compared to using the untransformed data.
>
> If I fit a glm model is the syntax below correct?
>
> xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1)
>
> The other question is whether for the dep_var I would need to use the inverse of the dep_var or the untransformed dep_var?
>  From my experiment below it seems to me that I should use the inverse of the dep_var unless my model (link function) is not correctly specified? I thought that I should use the dependent variable untransformed an the link function will take care of the rest?
>
> If I compare:
>
> 1. xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1)
>
> 2. xi:glm invesre_dep_var i.var1 i.var2 var3 var4,link(power -1)
>
> 3. xi:glm dep_var i.var1 i.var2 var3 var4
>
> models 2 and 3 give very similar results but model 1 very different.
> The difference between model 1 and 2 is the untransformed or the tranformed dependent variable.
> Perhaps models 2-3 are the same. Are the results form 2 & 3 similar because the data is close to normal anyways or is it because the specified models are equivalent.
> If models are equivalent then the very large difference in the coefficients between 1-2 do no make sense to me.
>
> I looked at the help glm file but I was not able to figure this out.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index