[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: L-statistic |

Date |
Thu, 24 Apr 2003 22:55:22 +0900 |

Responding to a question posted by Scott Millis, Nick Cox wrote: ---------------------------begin posting---------------------------------------- Scott Millis asked > Collett discusses the L-statistic in "Modelling Binary Data." Is it > possible to calculate this in Stata? These statistics, which occur as a set, appear to be the differences between the squares of the likelihood residuals calculated from two logit models, one of whose terms is a subset of the other's. So one recipe is something like glm <model 1>, link(logit) family(binomial) predict r1, likelihood glm <model 2>, link(logit) family(binomial) predict r2, likelihood gen li = r1^2 - r2^2 where for <model 1> and <model 2> you must plug in the varlists of the two models. However, in trying to reproduce the worked example in Collett's book (1/e; the 2/e I do not have access to), I find one anomaly. In one case a likelihood residual is reported by Stata as missing (I guess wildly that the problem is a request to take the square root of a negative number). Collett's results for that case imply that he replaced such a residual by zero. Perhaps someone can illuminate this. For all I know it may be standard practice or justifiable. The data can be downloaded from David Collett's website (he works at the University of Reading in the UK). -----------------------------end posting---------------------------------------- I don't know how much illumination the following provides, but I have the second edition and it has two examples of the L-statistic, one in using "Artificial data-1" (Table 5.10 on Page 162) and the other using "Determination of the ESR" (Table 5.15 on Page 181). Analysis of the latter dataset does produce an anomaly that Nick describes. According to the book (Page 132), likelihood residuals are weighted averages of the deviance and Pearson residuals, and it seems that the reason for the missing value from Stata in this example is that the Pearson residual is missing for this observation for the full (saturated) model, which includes both linear and quadratic terms. (See the do-file and and log file below.) The Pearson residual is missing because the predicted proportion is exactly 1 for this observation and that causes a division-by-zero error when calculating the Pearson residual. The author, David Collett, plots Pearson residuals for the reduced model (which doesn't produce a missing residual or a predicted proportion of 1) in Figure 5.20 (Page 175), but he switches to deviance residuals when discussing the full, quadratic model (the one that produces a missing Pearson residual) in Figure 5.21 on the next page. In that figure, the deviance residual appears to be zero, which would indicate that the author got similar results to what we get for the Pearson residual. Like Nick, I couldn't find any discussion of what standard practice is when such residuals arise, but it does look like the author replaced the missing Pearson residual by zero which, with the zero value for the corresponding deviance residual, would give zero for the likelihood residual, their weighted average. In comparing the Pearson deviance reported in the header of -glm-'s output and the sum of the squared Pearson residuals, treating the missing-value residual as effectively zero seems to be what Stata does here, too. Joseph Coveney -------------------------------------------------------------------------------- clear set more off tempfile tmp insheet using "Determination of the ESR.dat" replace v1 = subinstr(subinstr(v1, " ", " ", .), " ", ",", .) outsheet using `tmp', comma nonames noquote insheet using `tmp', comma names clear erase `tmp' // Creating squared fibrinogen levels generate float fib2 = fib * fib // Reduced model (intercept and fibrinogen) glm y fib, family(binomial) link(logit) nolog predict r1, likelihood // Quadratic model (intercept, fibrinogen and fibrinogen squared) glm y fib fib2, family(binomial) link(logit) nolog scalar pearson_deviance = e(deviance_p) predict r2, likelihood // Creating L-statistics for each observation generate float L = r1^2 - r2^2 display r1[13]^2 - 0^2 // Creating predicted proportion for full quadratic model predict pi2, mu predict d2, deviance predict p2, pearson display pi2[13] // is exactly 1 display pi2[29] // is not exactly 1 // Comparing Pearson Deviance with sum of squared Pearson residuals generate double p22 = p2 * p2 summarize p22, meanonly display r(sum) - scalar(pearson_deviance) format r* L d* p* %6.3f format fib %4.2f format y %1.0f capture log close capture erase table5_15.log log using table5_15 clist fib y r1-p2 log close exit -------------------------------------------------------------------------------- Edited log is below -------------------------------------------------------------------------------- fib y r1 r2 L pi2 d2 p2 1. 2.52 0 -0.455 -0.181 0.174 0.016 -0.179 -0.127 2. 2.56 0 -0.471 -0.166 0.194 0.013 -0.164 -0.116 3. 2.19 0 -0.340 -0.687 -0.356 0.197 -0.663 -0.496 4. 2.18 0 -0.337 -0.727 -0.415 0.216 -0.699 -0.526 5. 3.41 0 -0.957 -1.179 -0.473 0.426 -1.055 -0.862 6. 2.46 0 -0.432 -0.213 0.141 0.022 -0.211 -0.150 7. 3.22 0 -0.819 -0.411 0.502 0.077 -0.401 -0.289 8. 2.21 0 -0.346 -0.615 -0.258 0.164 -0.598 -0.442 9. 3.15 0 -0.773 -0.303 0.507 0.043 -0.297 -0.213 10. 2.60 0 -0.487 -0.154 0.214 0.012 -0.153 -0.108 11. 2.29 0 -0.372 -0.411 -0.030 0.078 -0.404 -0.292 12. 2.35 0 -0.392 -0.316 0.054 0.047 -0.312 -0.223 13. 5.06 1 0.457 . . 1.000 0.000 . 14. 3.34 1 1.560 1.878 -1.092 0.234 1.705 1.810 15. 2.38 1 2.384 2.797 -2.138 0.038 2.562 5.059 16. 3.15 0 -0.773 -0.303 0.507 0.043 -0.297 -0.213 17. 3.53 1 1.418 0.759 1.434 0.812 0.645 0.481 18. 2.68 0 -0.522 -0.140 0.253 0.010 -0.139 -0.099 19. 2.60 0 -0.487 -0.154 0.214 0.012 -0.153 -0.108 20. 2.23 0 -0.353 -0.552 -0.181 0.136 -0.540 -0.396 21. 2.88 0 -0.618 -0.147 0.360 0.011 -0.146 -0.104 22. 2.65 0 -0.509 -0.144 0.238 0.010 -0.143 -0.101 23. 2.09 1 2.658 1.484 4.865 0.466 1.236 1.071 24. 2.28 0 -0.368 -0.430 -0.050 0.086 -0.423 -0.306 25. 2.67 0 -0.517 -0.141 0.248 0.010 -0.140 -0.099 26. 2.29 0 -0.372 -0.411 -0.030 0.078 -0.404 -0.292 27. 2.15 0 -0.328 -0.872 -0.652 0.285 -0.818 -0.631 28. 2.54 0 -0.463 -0.173 0.184 0.015 -0.171 -0.122 29. 3.93 1 1.135 0.012 1.288 1.000 0.012 0.009 30. 3.34 0 -0.904 -0.768 0.227 0.234 -0.730 -0.552 31. 2.99 0 -0.678 -0.181 0.426 0.016 -0.179 -0.127 32. 3.32 0 -0.889 -0.686 0.320 0.194 -0.657 -0.491 -------------------------------------------------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: L-statistic***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Help with ARIMA** - Next by Date:
**st: do2htm command for making web pages from .do files, with graphs!** - Previous by thread:
**Re: st: L-statistic** - Next by thread:
**Re: st: L-statistic** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |