[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: "predicts failure perfectly" in binary logit |

Date |
Tue, 22 Nov 2005 16:31:25 +0900 |

Kai Arzheimer wrote: as usually, the short answer to your question would be 'it depends'. I suggest you have a look a recent article by Christopher Zorn that discusses the problem of separation which is behind the error message. Reading that paper will probably help you to make an informed decision. The reference is: @article{, Author = {Zorn, Christopher}, Title = {A Solution to Separation in Binary Response Models}, Journal = {Political Analysis}, Volume = {13}, Pages = {157-170}, Year = {2005} } -------------------------------------------------------------------------------- Zorn ( manuscript/reprint at www.cas.sc.edu/poli/psrw/Zorn_PA_Final.pdf ) describes the application of Firth's approach to binomial regression. Firth recommended using the Jeffreys prior in a penalized maximum likelihood like approach to ameliorate the problem of upward bias in generalized linear model fits. Application of Firth's method to "separation" in logistic regression is described by Heinze and Schemper (manuscript/reprint downloadable from www.meduniwien.ac.at/msi/biometrie/publikationen/abstracts_en.htm#a25 ). One implementation described in the Heinze and Schemper article, one that's easy to do in Stata, is "splitting each original observation i into two new observations having response values Y_i and 1?Y_i with iteratively updated weights 1+h_i/2 and h_i/2, respectively." (The h_i's are elements of the hat matrix, obtained with -predict , hat- after -glm-: predict them, turn around and use them as weights in the next iteration of -glm-, re-predict the next batch, etc., until a convergence criterion is reached.) This is illustrated in the do-file below, which uses the worked example from the Heinze and Ploner follow-up article (manuscript/reprint downloadable from www.meduniwien.ac.at/msi/biometrie/publikationen/abstracts_en.htm#a30 ). The example uses a dataset that Cytel uses (or used to use) to illustrate its exact logistic regression program, LogXact. It should be easy, too, to construct a general-use ado-file on the same basis . . . I started on it more than a month ago, when Kai e-mailed privately about whether anything exists in Stata as it does in SAS, S-Plus/R, etc., but just haven't had the time to get back to it in order to work it up into something distributable. The ado below, which needs a prettier printout and a help-file, uses a change in log-likelihood as the convergence criterion. Someone could suggest a better choice, I'm sure, but that convergence criterion--meagerly set at 0.0001--is sufficient to illustrate the approach (coefficients and their standard errors reasonably agree with those shown in the worked example's source). Joseph Coveney clear set more off input byte uti byte GrpSize byte age byte oc byte vic byte /// vicl byte vis byte dia 1 2 0 0 0 0 1 0 14 16 0 0 1 0 0 0 3 4 0 0 1 0 1 0 10 18 0 0 1 1 0 0 12 30 0 0 1 1 1 0 1 1 0 0 1 1 1 1 44 86 0 1 0 0 0 0 3 3 0 1 0 0 0 1 0 1 0 1 0 0 1 0 15 16 0 1 1 0 0 0 1 1 0 1 1 0 0 1 2 2 0 1 1 0 1 0 7 12 0 1 1 1 0 0 3 9 0 1 1 1 1 0 2 2 1 0 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 1 1 3 1 0 1 1 0 0 0 4 1 0 1 1 1 0 1 1 1 0 1 1 1 1 5 19 1 1 0 0 0 0 0 1 1 1 0 0 1 0 3 4 1 1 1 0 0 0 0 2 1 1 1 1 1 0 end rename uti count1 generate byte count0 = GrpSize - count1 drop GrpSize reshape long count, i(age-dia) j(positive) drop if count == 0 expand count drop count rename oc oral rename vic condom rename vicl lubri rename vis sperm rename dia diaphrag replace age = !age // Heinze & Ploner flipped this * capture program drop firthlogit program define firthlogit, eclass sortpreserve byable(recall) syntax varlist(min = 1) [if] [in], [or] tempvar hat key split false pseudo_response weight preserve quietly { glm `varlist' , family(binomial) link(logit) `nolog' local depvar `e(depvar)' local indepvarlist = /// subinstr("`varlist'", "`depvar'", "", 1) predict `hat', hat * Splitting generate long `key' = _n generate byte `split' = 2 expand `split' drop `split' bysort `key': generate byte `false' = _n - 1 generate byte `pseudo_response' = `depvar' bysort `key' (`false'): replace `pseudo_response' = /// 1 - `depvar' if _n == _N generate float `weight' = 1 + `hat' / 2 replace `weight' = `hat' / 2 if `false' } * Loop to convergence local last_ll `e(ll)' local criterion = 1 local i = 1 while `criterion' > 10^-4 { quietly { glm `pseudo_response' `indepvarlist' /// [iweight = `weight'], family(binomial) link(logit) local criterion = abs((`last_ll' - e(ll) ) / `last_ll') drop `hat' predict `hat', hat replace `weight' = 1 + `hat' / 2 replace `weight' = `hat' / 2 if `false' } display in smcl as text "Firth", "Last Log-", /// "This Log-", "Fractional" display in smcl as text "Iteration", "likelihood", /// "likelihood", "Change" display in smcl as result "`++i'", `last_ll', /// `e(ll)', `criterion' local last_ll = e(ll) } if "`or'" != "" { glm , eform } else { glm } end * firthlogit positive age-diaphrag exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: (another) Stata 9/se windows problem** - Next by Date:
**Re: st: RE: labels** - Previous by thread:
**Re: st: "predicts failure perfectly" in binary logit** - Next by thread:
**st: using Stata to generate results for an internet app** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |