Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SAS vs STATA : why is xtlogit SO slow ?

Subject   Re: st: SAS vs STATA : why is xtlogit SO slow ?
Date   Tue, 7 Feb 2012 19:19:44 +0100

Dear Joseph, Dear Klaus,

My name is Francesco by the way ;-)
I shall thank you (again) Joseph for your very precise mail, and there
are my answers for each point you mentioned :

1) Stata and SAS see indeed the same dataset. I use Stattransfer and I
confirm that exactly the same number of uninformative observations are
2) Unfortunately neither the difficult nor the tech(bhhh) algorithm
selection could help.. The computation is quite long in any case.
3) Unfortunately I obtain Iteration 0:   log likelihood = -8.99e+307
flat region resulting in a missing likelihood
4)a) Joseph, the last line dropped when I wrote the mail : SAS shows
indeed the line
Convergence criterion (GCONV=1E-8) satisfied.
4)b&c Everything seems correct I think... here are the results (I have
SAS in .. french ;-) )

Convergence criterion (GCONV=1E-8) satisfied.

Statistiques d'ajustement du modèle
Critère	Sans
covariables	Avec
AIC	3930972.1	3927734.2
SC	3930972.1	3927775.4
-2 Log	3930972.1	3927728.2

Test de l'hypothèse nulle globale : BETA=0
Test	Khi-2	DDL	Pr > Khi-2
Rapport de vrais	3243.8205	3	<.0001
Score	3223.0277	3	<.0001
Wald	3214.1748	3	<.0001

Estimations par l'analyse du maximum de vraisemblance
Paramètre	DDL	Valeur estimée	Erreur
type	Khi-2
de Wald	Pr > Khi-2
CONT2 	1	0.4467	0.0462	93.4407	<.0001
CONT1   l	1	2.4950	0.0570	1918.5537	<.0001
DUM	        1	0.2208	0.00608	1316.6314	<.0001

4)d When I put Sas' results into clogit I get no result at all (after
an infinite amount of time) :

clogit Y DUM CONT1 CONT2, group(ID) from(Beta, copy) ///
iterate(0) gradient hessian

Iteration 0:
                                                   log likelihood =    -1.#INF
Gradient vector (length =        .):
           lo:        lo:        lo:
       trader     P_risk  daily_vol
r1    -1.#IND    -1.#IND    -1.#IND

Hessian matrix:
                     lo:        lo:        lo:
                 trader     P_risk  daily_vol
   lo:trader    1.#QNAN
   lo:P_risk    1.#QNAN    1.#QNAN
lo:daily_vol    1.#QNAN    1.#QNAN    1.#QNAN
convergence not achieved

Conclusion : Is there a bug in clogit for some very special highly
unbalanced panel datasets?
I should probably ask the support... By the way Am I eligible to tech
support with a Stata network licence ?

Many thanks again,
I will keep the list posted with any further information...


On 6 February 2012 15:35, Joseph Coveney <> wrote:
> I didn't doubt your good intentions, but rather was trying to say that more
> information is better when there's a puzzle to solve.
> From your excerpt of the SAS output, it seems that SAS used a Marquardt
> Newton-Raphson algorithm, just as Stata does by default.  Possibly SAS nudges
> the diagonal elements more than Stata does, or its singularity-test threshold is
> higher.  Perhaps Stata's recursive algorithm for computing the conditioning term
> is sensitive to situations where panel lengths range 2 to 2000.
> If Richard's suggestion of using option -difficult- doesn't solve your problem,
> then consider the following.
> 1.  Confirm that Stata's -xtlogit- and SAS's PROC LOGISTIC are seeing the same
> dataset.  One way to verify that the data each sees are the same is to compare
> the log-likelihoods.  Because you cannot get convergence with Stata when the
> predictors are included, fit a model with no predictors (-clogit Y, group(ID)-).
> Then compare twice the negative log-likelihood value from Stata with null-model
> value shown by SAS (you mention that it's 3930972).  Are they identical?  If
> not, then there's probably a data-management error causing a difference in the
> two datasets.  With the size of your dataset, this might not be particularly
> sensitive to occasional differences, but it will detect systematic differences.
> (It might even not be very specific; the log-likelihoods might differ despite an
> identical dataset, which would be helpful to know, too:  see the footnote to 2.
> below.)  You've already compared a subset of your dataset, and the results
> match, but there could be some systematic difference in the longer panels.
> Also, verify that the number of singletons (and other cases with a constant
> response) that is being thrown out by both packages is the same.  Stata gives
> you a message before the iteration begins ("note: XXX groups (YYY obs) dropped
> because of all positive or all negative outcomes.".  SAS says "Number of
> Uninformative Strata" and "Frequency Uninformative".  Both numbers should agree
> between packages.
> 2.  If Stata's Marquardt algorithm isn't so aggressive as SAS's (or Stata's
> singularity threshold is too sensitive), then you can try to side-step the
> Hessian altogether.  Try -clogit . . . , . . . technique(bhhh)-.  (See
> for a thread by
> someone with the same problem as you report.*)
> 3.  If that fails, then you can go the route that SAS used to use for
> conditional/fixed-effects logistic regression prior to the STRATA statement,
> namely, Cox regression.
> generate byte time = 2 - Y
> stset time, failure(Y = 1)
> stcox DUM CONT1 CONT2, strata(ID) exactp nohr
> 4.  If everything fails, then you might need to use SAS's answer, as Klaus
> suggests.  In light of the warnings from Stata, you might want to check on a
> couple of things in SAS's model-fit before relying extensively on it.
> a. You mention two lines in your SAS output.
> "I obtain:
> Newton-Raphson Ridge Optimization
> Without Parameter Scaling"
> The very next line in the output, the one just after that last line above.  You
> didn't mention it.  Does it say, "Convergence criterion (GCONV=1E-8)
> satisfied."?  The same claim should be repeated in the SAS .LOG file.
> b. Is everything else agreeable in the SAS .LOG file?
> c. You mentioned that the omnibus tests are all P < 0.0001.  What do the
> regression coefficients and their covariance matrix look like?  Are they
> sensible?
> d. You probably didn't ask for an iteration trace in the SAS run, but it would
> be good to see how things look at convergence.  I haven't tried the following
> for a run that blows up, but I believe that you can get an idea of SAS's
> gradient and Hessian at-convergence by feeding its regression coefficients to
> Stata and then not iterating at all.  If it works, then it avoids re-running the
> model-fit in SAS.  Try the steps below.
> Type in SAS's logit (untransformed) regression coefficients at full displayed
> precision into a Stata matrix.
> matrix input Beta = (<DUM's coefficient> <CONT1's coefficient> ///
>    <CONT2's coefficient>)
> Then,
> clogit Y DUM CONT1 CONT2, group(ID) from(Beta, copy) ///
>    iterate(0) gradient hessian
> Are you satisfied that the gradient's length is reasonably close to zero, that
> SAS's GCONV was tight enough?  Which predictor is Stata complaining about in the
> Hessian?  (Probably DUM, from your description of the dataset.)  Look back at
> 4.c. above, again, asking how sensible that predictor's coefficient and standard
> error are.
> Joseph Coveney
> *That the same problem arose twice in independent situations, combined with your
> observation that another software package has no trouble, raises the distinct
> possibility that there's a bug in Stata's -clogit-.  If so, then it's a rare bug
> that's difficult for StataCorp to replicate and fix without help from users.
> I'm obviously just guessing here, but from the user's manual, the objective
> function that -clogit- maximizes resembles a penalized log-likelihood, and
> something like a problem in the recursive algorithm to compute the conditioning
> factor looks as if it could give rise to the kind of behavior you and Yu Xue
> describe.  Regardless, if you're satisfied with the items in 4. above, then you
> might be doing everyone a favor by contacting StataCorp for follow-up.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index