On Aug 5, 2006, at 7:50 PM, Daniel Schneider wrote:
I have been using the linktest command after GLM models quite
frequently, and I came to wonder whether it is implemented
correctly or not.
<snip>
Is it possible that the revision of GLM models for the more recent
stata version has changed some of the meaning of predictions and is
not correctly implemented in the linktest command that seems to be
based on an older Stata version?
On Aug 6, 2006, at 12:33 PM, Daniel Schneider wrote:
This is the replication of the linktest. Note that the results are
similar but not identical because linktest seems to rely on an
older version of GLM.
The test implemented by -linktest- (known as Tukey's one-degree-of-
freedom test for non-additivity) is a test of the null hypothesis
that the effects of the covariates are additive against alternatives
in which they are not (the introduction of the square of the linear
predictor to the model may be thought of as capturing a range of such
alternatives). Since one of the purposes of the link function in a
GLM is to linearize the effect of the covariates, in this context, it
may also be thought of as a test of the adequacy of the link
function. This is discussed in detail by Pregibon (1980), who
proposed a slightly different approach; the entry under [R] -
linktest- provides the full references to Pregibon (1980) and to
Tukey's original 1949 paper, and explains why Stata uses Tukey's
original formulation. I strongly suggest reading Pregibon's paper,
which provides a nice way to think about what is going on in the GLM
context.
Your posting also raised another (minor) issue worth commenting on.
In particular, you demonstrated the following
. sysuse auto
(1978 Automobile Data)
. glm price trunk foreign weight, fam(gam)
<output omitted>
. predict xb, xb
. gen xb2 = xb^2
. linktest, fam(gam) nolog
Residual df = 71 No. of obs
= 74
Pearson X2 = 5.113241 Deviance =
4.375989
Dispersion = .0720175 Dispersion
= .0616336
Gamma distribution, power link (power = -1)
------------------------------------------------------------------------
------
price | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
_hat | 1.333585 .3680821 3.62 0.000 .
6121571 2.055012
_hatsq | -1117.429 1200.049 -0.93 0.352
-3469.481 1234.624
_cons | -.0000214 .0000261 -0.82 0.412 -.
0000726 .0000298
------------------------------------------------------------------------
------
. glm price xb xb2, fam(gam) nohead nolog
------------------------------------------------------------------------
------
| OIM
price | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
xb | 1.333624 .3679106 3.62 0.000 .
6125322 2.054715
xb2 | -1117.528 1199.493 -0.93 0.352
-3468.491 1233.436
_cons | -.0000214 .0000261 -0.82 0.412 -.
0000726 .0000297
------------------------------------------------------------------------
------
noting that the results obtained by manually regressing price on xb
and xb2 are slightly different from those obtained via -linktest-. A
quick look at the code for -linktest- (type -viewsource
linktest.ado-) reveals why; namely, there is a -version 6- at the
top, which means that when the model including _hatsq is fit, Stata
is running under version control as if it were version 6. Under
version 6, -glm- used Iteratively Reweighted Least Squares (IRLS) and
reported standard errors based on the expected information matrix.
One of the changes made in version 7 was to make use of Stata's -ml-
the default, as well as reporting standard errors based on the
observed information matrix. Thus, to replicate the results of -
linktest-, you must use the -irls- option:
. glm price xb xb2, fam(gam) nohead nolog irls
------------------------------------------------------------------------
------
| EIM
price | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
xb | 1.333585 .3680821 3.62 0.000 .
6121571 2.055012
xb2 | -1117.429 1200.049 -0.93 0.352
-3469.481 1234.624
_cons | -.0000214 .0000261 -0.82 0.412 -.
0000726 .0000298
------------------------------------------------------------------------
------
Given that -glm- now uses -ml- by default, this behavior seems a bit
inconsistent (though, admittedly, it is a minor issue).
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/