Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Francesco <cariboupad@gmx.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: SAS vs STATA : why is xtlogit SO slow ? |

Date |
Sat, 25 Feb 2012 12:43:33 +0100 |

Dear Statalist ... As promised I have contacted the tech support, and here is their last answer : ****************************************** "Dear Francesco, The developer finished his evaluation and here is the result. - The problem with your estimation is certainly caused by an overflow due to the large number of individuals for one or more groups. - At this point the code is going to be modified to produce a more informative error message. - The current alternative would to use the Breslow approximation. The code below reproduces an example: cscript clogit_cox local 0, `0' syntax, [ t(integer 1028) seed(string) lambda(real 0) ] qui set obs 25 if ("`seed'"!="") set seed `seed' di "RNG state: `c(seed)'" gen int id = _n gen double constant = rnormal() qui expand `t' gen int row = _n gen double u = -log(-log(runiform())) gen byte predictor = runiform() < 0.5 gen double y = constant + predictor + u by id (row), sort : egen int r = rank(y) if (`lambda'<=0) local lambda = `t'/2 qui by id (row) : gen int k = rpoisson(`lambda') if _n==1 qui by id (row) : replace k = k[_n-1] if _n>1 gen byte response = r <= k tabulate id response gen newid = group(id) gen time = newid + .5 stset time, enter(newid) fail(response) cap noi stcox c.predictor, nohr cap noi stcox c.predictor, exactp nohr cap noi clogit response c.predictor, group(id) The problem would be reviewed again but at this time there is still not certainty on whether and/or when an alternative would be incorporated to produce results in those extreme cases. Sincerely, Gustavo" ******************************************* Therefore despite their great reactivity, professionalism and support...I am afraid the problem is not completely fixed for now... I hope that they will find a solution...sooner or later I am sure they will. Anyway, thanks again for your help and suggestions, Best Regards On 11 February 2012 09:10, Joseph Coveney <jcoveney@bigplanet.com> wrote: > > Francesco wrote (excerpted): > > clogit seems not to behave well when some individuals show a very > (very very) high number of points. > > [Example omitted] > > What do you think ? Are you convinced ? > > I might just say that I am not telling that one package is better than > the other : they have all their strenghts and weaknesses. > I confess also that I personally prefer to use Stata, so if my > hypothesis is right and it is possible to fix this rare problem, I > will be very very glad. > > I wait for your eventual comments before sending the code and > explanation to the support team > > > -------------------------------------------------------------------------------- > > Good job. It looks like some kind of underflow problem in computing the > conditioning factor when T exceeds some threshold for a given N-and-T > combination. I suppose that with such long Ts, it's a matter of a log of > a tiny > probability subtracted from another one, or something analogous. I don't > know > what SAS does; perhaps it implements some critical parts of the algorithm > in > quad-precision and so can go out further in T before it, too, breaks down. > > You can find edge cases where -technique(bhhh)- saves the day (see example > below, in the do-file and the first manual run after do-file terminates), > and > that might have been what Yu Xue saw. Yours are way over the threshold at > T = > 2000+, and the log-likelihood underflows upon the zeroeth iteration (see > example > below, the second manual run after do-file terminates), and things go > downhill > from there, regardless of what you try. In the empty model (the third > manual > run after the do-file terminates), it appears to converge, but with a > log-likelihood of -8.99e+307 in the iteration log and missing in the > regression > table, and so whatever it is seems to involve the conditioning factor and > not > necessarily something in the linear predictor and its derivatives. > > Go ahead and send your findings to StataCorp, and see what they think. > > Joseph Coveney > > > . do "F:\1R.do" > > . version 11.2 > > .. clear * > > . set more off > > . quietly set obs 25 > > . set seed `=date("2012-02-11", "YMD")' > > . generate int id = _n > > . generate double constant = rnormal() > > . generate int T = 1075 > > . expand T > (26850 observations created) > > . generate int row = _n > > . generate byte predictor = runiform() < 0.5 > > . generate byte response = runiform() < 0.5 > > . bysort id (row): generate int t = _n > > . forvalues t = 1028/1029 { > 2. display in smcl _newline(2) as text "T = " as result `t' > 3. capture clogit response c.predictor if t <= `t', group(id) > 4. if _rc == 430 { > 5. display in smcl as error "Fails" > 6. local T = `t' > 7. continue, break > 8. } > 9. display in smcl as result "OK" > 10. } > > > T = 1028 > OK > > > T = 1029 > Fails > > . clogit response c.predictor if t <= `T', group(id) /// > > technique(bhhh) trace gradient > note: multiple positive outcomes within groups encountered. > > > -------------------------------------------------------------------------------- > --------- > Iteration 0: > Parameter vector: > response: > predictor > r1 .0147443 > > log likelihood = > -17721.933 > Gradient vector (length = 1.112278): > response: > predictor > r1 -1.112278 > > -------------------------------------------------------------------------------- > --------- > Iteration 1: > Parameter vector: > response: > predictor > r1 .0140944 > > log likelihood = > -17721.932 > Gradient vector (length = .0693783): > response: > predictor > r1 -.0693783 > > -------------------------------------------------------------------------------- > --------- > > Conditional (fixed-effects) logistic regression Number of obs = > 25725 > LR chi2(1) = > 0.32 > Prob > chi2 = > 0.5735 > Log likelihood = -17721.932 Pseudo R2 = > 0.0000 > > (Std. Err. adjusted for clustering on > id) > > ------------------------------------------------------------------------------ > | OPG > response | Coef. Std. Err. z P>|z| [95% Conf. > Interval] > > -------------+---------------------------------------------------------------- > predictor | .0140944 .0203746 0.69 0.489 -.025839 > .0540279 > > ------------------------------------------------------------------------------ > > . exit > > end of do-file > > > . clogit response c.predictor if t <= 1029, group(id) > note: multiple positive outcomes within groups encountered. > > Iteration 0: log likelihood = -17721.933 > Iteration 1: log likelihood = -1.#IND > Hessian is not negative semidefinite > r(430); > > . clogit response c.predictor, group(id) > note: multiple positive outcomes within groups encountered. > > Iteration 0: log likelihood = -1.#INF > Iteration 1: log likelihood = -1.#IND > Hessian is not negative semidefinite > r(430); > > . clogit response , group(id) > note: multiple positive outcomes within groups encountered. > > Iteration 0: log likelihood = -8.99e+307 > > Conditional (fixed-effects) logistic regression Number of obs = > 26875 > LR chi2(0) = > . > Log likelihood = . Prob > chi2 = > . > > > ------------------------------------------------------------------------------ > response | Coef. Std. Err. z P>|z| [95% Conf. > Interval] > > -------------+---------------------------------------------------------------- > > ------------------------------------------------------------------------------ > > . > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Clyde B Schechter <clyde.schechter@einstein.yu.edu>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Klaus Pforr <kpforr@googlemail.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Klaus Pforr <kpforr@googlemail.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Francesco <cariboupad@gmx.fr>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

- Prev by Date:
**Re: st: Probability threshold for inclusion in ZINB count model** - Next by Date:
**st: Dfbeta in Cox regression post-estimation with Stata 10.1** - Previous by thread:
**Re: st: SAS vs STATA : why is xtlogit SO slow ?** - Next by thread:
**st: Program error after installing Stata 12.1 for the MAC.** - Index(es):