Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SAS vs STATA : why is xtlogit SO slow ?


From   Francesco <cariboupad@gmx.fr>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: SAS vs STATA : why is xtlogit SO slow ?
Date   Sat, 25 Feb 2012 12:43:33 +0100

Dear Statalist ...


As promised I have contacted the tech support, and here is their last answer :
******************************************
"Dear Francesco,

The developer finished his evaluation and here is the result.

- The problem with your estimation is certainly caused by an overflow due to
 the large number of individuals for one or more groups.

- At this point the code is going to be modified to produce a more informative
 error message.

- The current alternative would to use the Breslow approximation. The code
 below reproduces an example:

    cscript clogit_cox

    local 0, `0'
    syntax, [ t(integer 1028) seed(string) lambda(real 0) ]

    qui set obs 25
    if ("`seed'"!="") set seed `seed'

    di "RNG state: `c(seed)'"

    gen int id = _n
    gen double constant = rnormal()
    qui expand `t'
    gen int row = _n
    gen double u = -log(-log(runiform()))
    gen byte predictor = runiform() < 0.5
    gen double y = constant + predictor + u

    by id (row), sort : egen int r = rank(y)

    if (`lambda'<=0) local lambda = `t'/2

    qui by id (row) : gen int k = rpoisson(`lambda') if _n==1
    qui by id (row) : replace k = k[_n-1] if _n>1

    gen byte response = r <= k
    tabulate id response

    gen newid = group(id)
    gen time = newid + .5
    stset time, enter(newid) fail(response)
    cap noi stcox c.predictor, nohr
    cap noi stcox c.predictor, exactp nohr
    cap noi clogit response c.predictor, group(id)


The problem would be reviewed again but at this time there is still
not certainty
on whether and/or when an alternative would be incorporated to produce results
in those extreme cases.

Sincerely,

Gustavo"
*******************************************
Therefore despite their great reactivity, professionalism and
support...I am afraid the problem is not completely fixed for now...
I hope that they will find a solution...sooner or later I am sure they will.

Anyway, thanks again for your help and suggestions,

Best Regards


On 11 February 2012 09:10, Joseph Coveney <jcoveney@bigplanet.com> wrote:
>
> Francesco wrote (excerpted):
>
> clogit seems not to behave well when some individuals show a very
> (very very) high number of points.
>
> [Example omitted]
>
> What do you think ? Are you convinced ?
>
> I might just say that I am not telling that one package is better than
> the other : they have all their strenghts and weaknesses.
> I confess also that I personally prefer to use Stata, so if my
> hypothesis is right and it is possible to fix this rare problem, I
> will be very very glad.
>
> I wait for your eventual comments before sending the code and
> explanation to the support team
>
>
> --------------------------------------------------------------------------------
>
> Good job.  It looks like some kind of underflow problem in computing the
> conditioning factor when T exceeds some threshold for a given N-and-T
> combination.  I suppose that with such long Ts, it's a matter of a log of
> a tiny
> probability subtracted from another one, or something analogous.  I don't
> know
> what SAS does; perhaps it implements some critical parts of the algorithm
> in
> quad-precision and so can go out further in T before it, too, breaks down.
>
> You can find edge cases where -technique(bhhh)- saves the day (see example
> below, in the do-file and the first manual run after do-file terminates),
> and
> that might have been what Yu Xue saw.  Yours are way over the threshold at
> T =
> 2000+, and the log-likelihood underflows upon the zeroeth iteration (see
> example
> below, the second manual run after do-file terminates), and things go
> downhill
> from there, regardless of what you try.  In the empty model (the third
> manual
> run after the do-file terminates), it appears to converge, but with a
> log-likelihood of -8.99e+307 in the iteration log and missing in the
> regression
> table, and so whatever it is seems to involve the conditioning factor and
> not
> necessarily something in the linear predictor and its derivatives.
>
> Go ahead and send your findings to StataCorp, and see what they think.
>
> Joseph Coveney
>
>
> . do "F:\1R.do"
>
> . version 11.2
>
> .. clear *
>
> . set more off
>
> . quietly set obs 25
>
> . set seed `=date("2012-02-11", "YMD")'
>
> . generate int id = _n
>
> . generate double constant = rnormal()
>
> . generate int T = 1075
>
> . expand T
> (26850 observations created)
>
> . generate int row = _n
>
> . generate byte predictor = runiform() < 0.5
>
> . generate byte response = runiform() < 0.5
>
> . bysort id (row): generate int t = _n
>
> . forvalues t = 1028/1029 {
>  2.     display in smcl _newline(2) as text "T = " as result `t'
>  3.     capture clogit response c.predictor if t <= `t', group(id)
>  4.     if _rc == 430 {
>  5.         display in smcl as error "Fails"
>  6.         local T = `t'
>  7.         continue, break
>  8.     }
>  9.     display in smcl as result "OK"
>  10. }
>
>
> T = 1028
> OK
>
>
> T = 1029
> Fails
>
> . clogit response c.predictor if t <= `T', group(id) ///
> >     technique(bhhh) trace gradient
> note: multiple positive outcomes within groups encountered.
>
>
> --------------------------------------------------------------------------------
> ---------
> Iteration 0:
> Parameter vector:
>     response:
>    predictor
> r1   .0147443
>
>                                                   log likelihood =
> -17721.933
> Gradient vector (length = 1.112278):
>     response:
>    predictor
> r1  -1.112278
>
> --------------------------------------------------------------------------------
> ---------
> Iteration 1:
> Parameter vector:
>     response:
>    predictor
> r1   .0140944
>
>                                                   log likelihood =
> -17721.932
> Gradient vector (length = .0693783):
>     response:
>    predictor
> r1  -.0693783
>
> --------------------------------------------------------------------------------
> ---------
>
> Conditional (fixed-effects) logistic regression   Number of obs   =
>  25725
>                                                  LR chi2(1)      =
> 0.32
>                                                  Prob > chi2     =
> 0.5735
> Log likelihood = -17721.932                       Pseudo R2       =
> 0.0000
>
>                                     (Std. Err. adjusted for clustering on
> id)
>
> ------------------------------------------------------------------------------
>             |                 OPG
>    response |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
>
> -------------+----------------------------------------------------------------
>   predictor |   .0140944   .0203746     0.69   0.489     -.025839
>  .0540279
>
> ------------------------------------------------------------------------------
>
> . exit
>
> end of do-file
>
>
> . clogit response c.predictor if t <= 1029, group(id)
> note: multiple positive outcomes within groups encountered.
>
> Iteration 0:   log likelihood = -17721.933
> Iteration 1:   log likelihood =    -1.#IND
> Hessian is not negative semidefinite
> r(430);
>
> . clogit response c.predictor, group(id)
> note: multiple positive outcomes within groups encountered.
>
> Iteration 0:   log likelihood =    -1.#INF
> Iteration 1:   log likelihood =    -1.#IND
> Hessian is not negative semidefinite
> r(430);
>
> . clogit response , group(id)
> note: multiple positive outcomes within groups encountered.
>
> Iteration 0:   log likelihood = -8.99e+307
>
> Conditional (fixed-effects) logistic regression   Number of obs   =
>  26875
>                                                  LR chi2(0)      =
>  .
> Log likelihood =          .                       Prob > chi2     =
>    .
>
>
> ------------------------------------------------------------------------------
>    response |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
>
> -------------+----------------------------------------------------------------
>
> ------------------------------------------------------------------------------
>
> .
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index