Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: glm model syntax |

Date |
Sun, 12 Feb 2012 10:19:06 +0000 |

Without specifying otherwise you are assumed to be using Stata 12.1, in which case using factor variable notation is recommended rather than -xi:-. That is a small detail, however. You seem confused about GLMs. You need to do more than read the help, but to work your way through some thorough introductory account. Any assumptions about the distribution of the dependent variable (I much prefer "response" or "outcome" for reasons often discussed) are about its conditional distribution given the predictors. -ladder- can not help with this unless you are working with residuals. It examines the marginal distribution. That is of some relevance but only indirectly. It is unlikely to be the case that both a variable and its reciprocal are approximately normally distributed. The reciprocal is a strong transformation and although it can conceivably return the same (kind of) distribution as the original this is unusual. Quantitative thinking is essential here, i.e. it is/is not normal is qualitative thinking. What are the skewness and kurtosis (or similar shape measures)? Much of the point of GLMs is that the link function replaces transformation of the predictor. You do not use a link function _and_ the same transformation. Model 3 uses no transformation while model 2 uses a reciprocal transformation and the reciprocal link. But the reciprocal link (approximately) reverses the transformation as the reciprocal of a reciprocal is the original. Thus what you did is similar to showing that 1/(1/y) is just y. Similar, but not identical, as using a particular function as link and the same function as response transformation are not exactly equivalent, but often the results will be close, as you indicate. Despite all this, the major part of assessment of a GLM is not of the marginal distribution but whether how closely the response is modelled by predictions and whether the model makes sense. Judgements here often need to be subtle and to mix statistical and scientific judgement. For example with sysuse auto glm mpg weight glm mpg weight, link(power -1) The linear model looks good by most standards but a reciprocal link function makes scientific sense (the implicit scale is gallons per mile). A transformation would also make sense but -glm- offers the possibility of considering other distribution familes too. See also SJ-4-4 gr0009 . . . . . . . . . . Speaking Stata: Graphing model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox (help anovaplot, indexplot, modeldiag, ofrtplot, ovfplot, qfrplot, racplot, rdplot, regplot, rhetplot, rvfplot2, rvlrplot, rvpplot2 if installed) Q4/04 SJ 4(4):449--475 plotting diagnostic information calculated from residuals and fitted values from regression models with continuous responses for a detailed example in which -glm- fit was also assessed graphically. (The package was updated in SJ 10-1.) In terms of your model, you can't rely on -ladder- results to indicate the best link. You need to consider various possibilities and also whether using a different distribution family other than normal makes sense. But your comparisons should start with xi:glm dep_var i.var1 i.var2 var3 var4 xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1) Nick On Sun, Feb 12, 2012 at 9:01 AM, Nikolaos Pandis <npandis@yahoo.com> wrote: > I have a continuous dependent variable and 4 predictors. > > The depedent variable is aproximately normally distributed, however, using the -ladder- command it is indicated that if I use the inverse (1/dep_var) my data will approximate normal distribution better compared to using the untransformed data. > > If I fit a glm model is the syntax below correct? > > xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1) > > The other question is whether for the dep_var I would need to use the inverse of the dep_var or the untransformed dep_var? > From my experiment below it seems to me that I should use the inverse of the dep_var unless my model (link function) is not correctly specified? I thought that I should use the dependent variable untransformed an the link function will take care of the rest? > > If I compare: > > 1. xi:glm dep_var i.var1 i.var2 var3 var4,link(power -1) > > 2. xi:glm invesre_dep_var i.var1 i.var2 var3 var4,link(power -1) > > 3. xi:glm dep_var i.var1 i.var2 var3 var4 > > models 2 and 3 give very similar results but model 1 very different. > The difference between model 1 and 2 is the untransformed or the tranformed dependent variable. > Perhaps models 2-3 are the same. Are the results form 2 & 3 similar because the data is close to normal anyways or is it because the specified models are equivalent. > If models are equivalent then the very large difference in the coefficients between 1-2 do no make sense to me. > > I looked at the help glm file but I was not able to figure this out. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: glm model syntax***From:*Nikolaos Pandis <npandis@yahoo.com>

**References**:**st: glm model syntax***From:*Nikolaos Pandis <npandis@yahoo.com>

- Prev by Date:
**Re: st: Best way to export a simple Stata model** - Next by Date:
**st: How to vertically align graphs with Stata 12 ?** - Previous by thread:
**st: glm model syntax** - Next by thread:
**Re: st: glm model syntax** - Index(es):