[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Binomial regression

From	Maarten buis <[email protected]>
To	[email protected]
Subject	Re: st: Binomial regression
Date	Thu, 2 Aug 2007 20:07:47 +0100 (BST)
This may be a silly question, but why are you using the identity link?
It is not very appropriate for a binary dependent variable since it
will eventually lead to prediction outside the allowable range, and
apperently there are also problems with getting the model to converge.
Logit and probit links will converge in no time in Stata (probably also
in SAS), and are more appropriate for binary dependent variables.

Maarten

--- Constantine Daskalakis <[email protected]> wrote:

> Good day to all.
> 
> I would like to share some observations on binomial regression using
> Stata (and SAS).
> 
> Essentially, the problem is that we have a binary outcome Y (0/1) and
> 
> want to model it as a function of covariates (X1, X2, etc) via the 
> generalized linear model
> 
> p = b0 + b1*X1 + b2*X2 + ...
> 
> So far, I've done such projects in Stata and I've noted that binomial
> 
> regression sometimes fails to converge and/or gives ML estimates of
> the 
> coefficients that yield estimated probabilities that are outside the 
> [0,1] range (typically, for some modest number of observations).
> 
> A colleague suggested that SAS might give better results and I was 
> skeptical at first. But I had some time in my hands (it's summer
> after 
> all!), so I took a few datasets and run them in Stata (-glm-) and SAS
> 
> (Proc Genmod).
> 
> The Stata (8.2) implementation I used is the default ML
> (Newton-Raphson) 
> algorithm
> 
> - glm y x1 x2 ..., fam(bin) link(i) search
> 
> or sometimes
> 
> - glm y x1 x2 ..., fam(bin) link(i) search fisher(#)
> 
> where # is some number of iterations with the Fisher scoring
> algorithm.
> 
> The SAS (9.1) implementation I used is
> 
> proc genmod;
> model y = x1 x2 ... / dist=bin link=id type3 wald itprint;
> run;
> 
> [I also played with specification of starting values, but I will
> leave 
> this piece out, and confine my posting to results obtained using the 
> default starting values in both packages.]
> 
> 
> Here's what I've found:
> 
> (1) Convergence
> 
> Stata often gets bogged down ("backed up") after a few iterations and
> does not converge.
> 
> Specifying Fisher scoring for some iterations in the beginning helps.
> After Newton-Raphson takes over from Fisher scoring, it occasionally 
> does converge. Most often, I have to use Fisher scoring throughout to
> 
> get convergence. But see point #3 below.
> 
> SAS does seem to often converge (on the basis of parameter vector 
> convergence), but also warns that the "relative Hessian convergence 
> criterion" has not been achieved and that "convergence is
> questionable" 
> (indicating that the likelihood has not really converged
> sufficiently).
> 
> (2) Likelihood of final model
> 
> The log-likelihood of the final Stata model is often somewhat better 
> than that of the final SAS model. This might suggest that the Stata 
> results are "better". However, see the drawback in point #4 below.
> 
> (3) Estimated coefficients and standard errors
> 
> Naturally, when Stata and SAS give different final models, their
> estimated coefficients are different.
> 
> But beware using Fisher's scoring throughout to get convergence and a
> 
> final model. Sometimes, this final model will have absurdly small 
> standard errors (with p < 0.001 for all variables). If something like
> 
> this happens, it might be useful to compute standard errors using the
> 
> option "OPG":
> 
> - glm y x1 x2 ..., fam(bin) link(i) search fisher(#) opg
> 
> [There are special complications when there are covariate levels that
> 
> have observed probability of 0 or 1 (ie, all observations are "0s" or
> 
> "1s"), but I'll leave this issue aside.]
> 
> (4) Estimated probabilities
> 
> When Stata has convergence trouble (and sometimes when it does not),
> it
> warns that some "parameter estimates produce inadmissible mean
> estimates
> in one or more observations."
> 
> SAS gives no such warnings.
> 
> When I compute the predicted probabilities from both the SAS and the
> Stata final models (which often are not the same, as I explained
> above),
> SAS almost always has them in the [0,1] interval but Stata has quite
> a
> few of them outside the interval.
> 
> I most recently ran 7 different models in a dataset (9 different 
> outcomes, each with the same 4 predictor variables). My total N was
> 278.
> 
> For 2 outcomes, SAS and Stata gave identical results (with all
> predicted 
> probabilities in the [0,1] range).
> 
> For the remaining 5 outcomes, SAS always gave a final model (although
> 
> with the warning regarding the Hessian convergence).
> 
> Stata w/ NR did not converge after 50 iterations.
> 
> Stata w/ FS did converge in 4 of the 5 cases (after 6-20 iterations).
> 
> In 4 of the 5 cases, the final model of Stata had a somewhat better 
> log-likelihood than the final model of SAS.
> 
> But:
> 
> (i) Although SAS's estimated standard errors appeared sensible,
> Stata's 
> (w/ FS) were clearly too small in 4 of the 5 cases.
> 
> (ii) The final SAS models never yielded a predicted probability less 
> than 0 or greater than 1. In contrast, Stata's models yielded 
> probabilities outside the [0,1] interval for 5-15% of the
> observations.
> 
> 
> I've done this sort of comparison with a couple of other datasets
> with 
> similar results. I know it is not proof positive (as opposed to a 
> carefully constructed simulation), but I conclude that Stata's
> algorithm 
> is not as robust as SAS's and would advise the use of SAS for
> binomial 
> regression problems.
> 
> Perhaps I have missed something or perhaps other people have
> different 
> experiences. So, please take this with the usual caution.
> 
> 
> Best,
> Constantine
> 
> 
> 
> -- 
> 
> 
> The documents accompanying this transmission may contain confidential
> health or business information. This information is intended for the
> use of the individual or entity named above. If you have received
> this information in error, please notify the sender immediately and
> arrange for the return or destruction of these documents.
> 
> Constantine Daskalakis, ScD
> Assistant Professor,
> Thomas Jefferson University, Division of Biostatistics
>     1015 Chestnut St., Suite M100, Philadelphia, PA 19107
>     Tel: 215-955-5695
>     Fax: 215-503-3804
>     Email: [email protected]
>     Webpage: http://www.jefferson.edu/clinpharm/biostatistics/
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Binomial regression
  - From: Constantine Daskalakis <[email protected]>
References:
- st: Binomial regression
  - From: Constantine Daskalakis <[email protected]>
Prev by Date: st: countfit and exposure
Next by Date: st: missing F statistic [was: ""]
Previous by thread: st: Binomial regression
Next by thread: Re: st: Binomial regression
Index(es):
- Date
- Thread