Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: L-statistic


From   Joseph Coveney <[email protected]>
To   Statalist <[email protected]>
Subject   Re: st: L-statistic
Date   Thu, 24 Apr 2003 22:55:22 +0900

Responding to a question posted by Scott Millis, Nick Cox wrote:

---------------------------begin posting----------------------------------------

Scott Millis asked

 > Collett discusses the L-statistic in "Modelling Binary Data." Is it
 > possible to calculate this in Stata?

These statistics, which occur as a set, appear to be the differences 
between the squares of the likelihood residuals calculated from
two logit models, one of whose terms is a subset of the other's.

So one recipe is something like

glm <model 1>, link(logit) family(binomial)
predict r1, likelihood
glm <model 2>, link(logit) family(binomial)
predict r2, likelihood
gen li = r1^2 - r2^2

where for <model 1> and <model 2> you
must plug in the varlists of the two models.

However, in trying to reproduce the worked
example in Collett's book (1/e; the 2/e I do not have
access to), I find one anomaly. In one case
a likelihood residual is reported by Stata
as missing (I guess wildly that the problem
is a request to take the square root of a negative
number). Collett's results for that case imply
that he replaced such a residual by zero.

Perhaps someone can illuminate this. For
all I know it may be standard practice or justifiable.
The data can be downloaded from David Collett's website
(he works at the University of Reading in the
UK).

-----------------------------end posting----------------------------------------

I don't know how much illumination the following provides, but I have the second edition 
and it has two examples of the L-statistic, one in using "Artificial data-1" (Table 5.10 on 
Page 162) and the other using "Determination of the ESR" (Table 5.15 on Page 181).  
Analysis of the latter dataset does produce an anomaly that Nick describes.

According to the book (Page 132), likelihood residuals are weighted averages of the 
deviance and Pearson residuals, and it seems that the reason for the missing value 
from Stata in this example is that the Pearson residual is missing for this observation 
for the full (saturated) model, which includes both linear and quadratic terms.  (See the 
do-file and and log file below.)  The Pearson residual is missing because the predicted 
proportion is exactly 1 for this observation and that causes a division-by-zero error 
when calculating the Pearson residual.

The author, David Collett, plots Pearson residuals for the reduced model (which doesn't 
produce a missing residual or a predicted proportion of 1) in Figure 5.20 (Page 175), 
but he switches to deviance residuals when discussing the full, quadratic model (the 
one that produces a missing Pearson residual) in Figure 5.21 on the next page.  In that 
figure, the deviance residual appears to be zero, which would indicate that the author 
got similar results to what we get for the Pearson residual.

Like Nick, I couldn't find any discussion of what standard practice is when such 
residuals arise, but it does look like the author replaced the missing Pearson residual by 
zero which, with the zero value for the corresponding deviance residual, would give 
zero for the likelihood residual, their weighted average.

In comparing the Pearson deviance reported in the header of -glm-'s output and the 
sum of the squared Pearson residuals, treating the missing-value residual as effectively 
zero seems to be what Stata does here, too.

Joseph Coveney

--------------------------------------------------------------------------------

clear
set more off
tempfile tmp
insheet using "Determination of the ESR.dat"
replace v1 = subinstr(subinstr(v1, "  ", " ", .), " ", ",", .)
outsheet using `tmp', comma nonames noquote
insheet using `tmp', comma names clear
erase `tmp'
// Creating squared fibrinogen levels
generate float fib2 = fib * fib
// Reduced model (intercept and fibrinogen)
glm y fib, family(binomial) link(logit) nolog
predict r1, likelihood
// Quadratic model (intercept, fibrinogen and fibrinogen squared)
glm y fib fib2, family(binomial) link(logit) nolog
scalar pearson_deviance = e(deviance_p)
predict r2, likelihood
// Creating L-statistics for each observation
generate float L = r1^2 - r2^2
display r1[13]^2 - 0^2
// Creating predicted proportion for full quadratic model
predict pi2, mu
predict d2, deviance
predict p2, pearson
display pi2[13]  // is exactly 1
display pi2[29]  // is not exactly 1
// Comparing Pearson Deviance with sum of squared Pearson residuals
generate double p22 = p2 * p2
summarize p22, meanonly
display r(sum) - scalar(pearson_deviance)
format r* L d* p* %6.3f
format fib %4.2f
format y %1.0f
capture log close
capture erase table5_15.log
log using table5_15
clist fib y r1-p2
log close
exit

--------------------------------------------------------------------------------
Edited log is below
--------------------------------------------------------------------------------

      fib  y      r1      r2       L     pi2      d2      p2
  1. 2.52  0  -0.455  -0.181   0.174   0.016  -0.179  -0.127
  2. 2.56  0  -0.471  -0.166   0.194   0.013  -0.164  -0.116
  3. 2.19  0  -0.340  -0.687  -0.356   0.197  -0.663  -0.496
  4. 2.18  0  -0.337  -0.727  -0.415   0.216  -0.699  -0.526
  5. 3.41  0  -0.957  -1.179  -0.473   0.426  -1.055  -0.862
  6. 2.46  0  -0.432  -0.213   0.141   0.022  -0.211  -0.150
  7. 3.22  0  -0.819  -0.411   0.502   0.077  -0.401  -0.289
  8. 2.21  0  -0.346  -0.615  -0.258   0.164  -0.598  -0.442
  9. 3.15  0  -0.773  -0.303   0.507   0.043  -0.297  -0.213
 10. 2.60  0  -0.487  -0.154   0.214   0.012  -0.153  -0.108
 11. 2.29  0  -0.372  -0.411  -0.030   0.078  -0.404  -0.292
 12. 2.35  0  -0.392  -0.316   0.054   0.047  -0.312  -0.223
 13. 5.06  1   0.457       .       .   1.000   0.000       .
 14. 3.34  1   1.560   1.878  -1.092   0.234   1.705   1.810
 15. 2.38  1   2.384   2.797  -2.138   0.038   2.562   5.059
 16. 3.15  0  -0.773  -0.303   0.507   0.043  -0.297  -0.213
 17. 3.53  1   1.418   0.759   1.434   0.812   0.645   0.481
 18. 2.68  0  -0.522  -0.140   0.253   0.010  -0.139  -0.099
 19. 2.60  0  -0.487  -0.154   0.214   0.012  -0.153  -0.108
 20. 2.23  0  -0.353  -0.552  -0.181   0.136  -0.540  -0.396
 21. 2.88  0  -0.618  -0.147   0.360   0.011  -0.146  -0.104
 22. 2.65  0  -0.509  -0.144   0.238   0.010  -0.143  -0.101
 23. 2.09  1   2.658   1.484   4.865   0.466   1.236   1.071
 24. 2.28  0  -0.368  -0.430  -0.050   0.086  -0.423  -0.306
 25. 2.67  0  -0.517  -0.141   0.248   0.010  -0.140  -0.099
 26. 2.29  0  -0.372  -0.411  -0.030   0.078  -0.404  -0.292
 27. 2.15  0  -0.328  -0.872  -0.652   0.285  -0.818  -0.631
 28. 2.54  0  -0.463  -0.173   0.184   0.015  -0.171  -0.122
 29. 3.93  1   1.135   0.012   1.288   1.000   0.012   0.009
 30. 3.34  0  -0.904  -0.768   0.227   0.234  -0.730  -0.552
 31. 2.99  0  -0.678  -0.181   0.426   0.016  -0.179  -0.127
 32. 3.32  0  -0.889  -0.686   0.320   0.194  -0.657  -0.491


--------------------------------------------------------------------------------



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index