[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Binomial regression |

Date |
Thu, 02 Aug 2007 15:59:33 -0500 |

Constantine Daskalakis <C_Daskalakis@mail.jci.tju.edu> makes several observations concerning convergence properties and predicted probabilities based on binomial models with identity links, namely models of the form . glm y x1 x2 ..., fam(bin) link(identity) His general observation is that these models, when fit in Stata, often have difficulty converging and often warn of predicted probabilities outside the admissible range. These two problems are really one and the same. They are borne out of the use of the identity link, which has a range encompassing the entire real line in contrast to a response probability constrained to be in the range [0,1]. Constantine also compares these behaviors to those of SAS, and observes SAS to be more cooperative in models where Stata fails to converge. In what follows I provide more details but I'll begin with a summary: In such situations, regardless of software used, you have a non-convergent model unworthy of serious interpretation. Different softwares have different ways of telling you this, with Stata taking the most direct approach. Consantine writes: > Here's what I've found: > (1) Convergence > Stata often gets bogged down ("backed up") after a few iterations and does > not converge. > Specifying Fisher scoring for some iterations in the beginning helps. After > Newton-Raphson takes over from Fisher scoring, it occasionally does > converge. Most often, I have to use Fisher scoring throughout to get > convergence. But see point #3 below. What is happening here is that the maximum-likelihood algorithm is producing parameter estimates that produce linear predictors in one or more observations that bump up on the boundaries of [0,1]. Since a value outside of [0,1] is an inadmissible probability, a constant probability just above zero or just below one is used instead. If this occurs on a few observations, this isn't much of a problem. If it occurs too much, however, using constants will produce a ridge in the likelihood making convergence of Hessian-based ML difficult. Such behavior and the resulting non-convergent model should serve as a signal that your data are not appopriate for an identity link. Convergence could be forced by using any number of alternate methods, including (a) deleting the offending observations from the analysis (b) relaxing the convergence criterion (c) gerrymandering regression coefficients so as to not produce inadmissible predicted probabilities just to name a few. You can mimic these behaviors in Stata through the appropriate options or through some creative link-function programming, but we do not recommend that. Any one of the above methods would help convergence, but the price is one of model interpretibility. The resulting estimates would not have the properties of standard MLE's, since they don't really maximize the model likelihood. > SAS does seem to often converge (on the basis of parameter vector > convergence), but also warns that the "relative Hessian convergence > criterion" has not been achieved and that "convergence is questionable" > (indicating that the likelihood has not really converged sufficiently). Both SAS and Stata are telling you the same thing. You have a non-convergent model. A table of parameter estimates does not change that. > (2) Likelihood of final model > The log-likelihood of the final Stata model is often somewhat better than > that of the final SAS model. This might suggest that the Stata results are > "better". However, see the drawback in point #4 below. This pretty much seals that the SAS results are not MAXIMUM likelihood. > (3) Estimated coefficients and standard errors > Naturally, when Stata and SAS give different final models, their estimated > coefficients are different. > But beware using Fisher's scoring throughout to get convergence and a final > model. Sometimes, this final model will have absurdly small standard errors > (with p < 0.001 for all variables). If something like this happens, it might > be useful to compute standard errors using the option "OPG": > - glm y x1 x2 ..., fam(bin) link(i) search fisher(#) opg > [There are special complications when there are covariate levels that have > observed probability of 0 or 1 (ie, all observations are "0s" or "1s"), but > I'll leave this issue aside.] Using Fisher scoring can help convergence, but standard errors based on Fisher scoring require the additional assumption that your mean function is specified correctly. If these standard errors are absurdly small, then this demonstrates a violation of this assumption, providing further evidence that your data are poorly suited to this model. > (4) Estimated probabilities > When Stata has convergence trouble (and sometimes when it does not), it > warns that some "parameter estimates produce inadmissible mean estimates in > one or more observations." > SAS gives no such warnings. We don't know why SAS sometimes keeps predictions in the [0,1], but don't read too much into that. Either you have a non-convergent model in both SAS and Stata, in which case nothing is interpretable, or perhaps SAS has used one the methods (a), (b), (c), or another ad hoc adjustment. Even if we knew the exact adjustment being made, it would be almost impossible to measure its impact on model interpretibility. --Bobby --Vince rgutierrez@stata.com vwiggins@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Binomial regression***From:*Constantine Daskalakis <C_Daskalakis@mail.jci.tju.edu>

- Prev by Date:
**RE: st: Number Needed to be Treated (NNT)** - Next by Date:
**Re: st: Binomial regression** - Previous by thread:
**st: Linear estimates** - Next by thread:
**Re: st: Binomial regression** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |