Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: SAS vs STATA : why is xtlogit SO slow ?


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: SAS vs STATA : why is xtlogit SO slow ?
Date   Thu, 2 Feb 2012 21:03:26 +0900

Francesco wrote:

I have a very large panel dataset (about 7mo observations, 70 000
individuals, 50 points on average per individual) and I tried
desperately to estimate a fixed effect logit using : xtlogit, fe with
Stata...
Unfortunately the log likelihood never converge and stops at iteration
1 or 2 saying that the matrix is not semi definite positive...
(Iteration 0:   log likelihood =    -1.#INF , etc)

However when I try with SAS (using proc Logistic with the strata
option) the computation is quite long (more than one hour) but at
least I can see the results...

My question is : how is that possible ? I believed that storing the
database into the RAM (as STATA do) would speed computations... and I
would love to use xtlogit (or clogit) instead of using SAS...

any idea ?

--------------------------------------------------------------------------------

A couple of suggestions, if you haven't considered them already:

1. It seems as if speed isn't really the problem--it's a symptom of the major
problem:   PROC LOGISTIC is telling you a hunky-dory convergence story and
Stata's notifying you that your model isn't legit after a single iteration or
two.  Are you sure that you're fitting the same model in SAS and Stata?  (You
didn't show any code, and so it's impossible to tell from what you posted on the
list.)  Try taking the first 200 or 250 individuals' data and running just those
in your SAS and Stata models (with just 200-250 panels, both SAS and Stata
should converge relatively quickly if they're ever going to).  Do the
log-likelihood values, regression coefficients and their standard errors look
the same?  (Ignore flipping of coefficient sign--PROC LOGISTIC reverses the
order of the response variable, or at least it used to.)

2. I don't understand your limited description of the dataset (70 000 × 50 !=
7e6), but if you have only categorical predictors, then try using -contract
<response predictors id_variable>, freq(weight)- on your *wide* dataset, then
-reshape long- the result, and then finally use -clogit <response predictors>
[fweight=weight], group(<id_variable>)-.

As far as I know, SAS's procedures are in compiled programming languages (as I
recall, it's been predominantly, if not exclusively, C since the Big Rewrite).
Stata has some of the code for -xtlogit- in interpreted ado-files, but I suspect
that the critical-path stuff is in compiled C, too.  Run times in SAS are more
I/O dependent, as you note, but a few-hundred-variable-wide, seven-million
observation dataset isn't going to be any kind of problem for either SAS or
Stata.  So, I cannot think of a reason why there should be a major difference in
run times between the two packages for the same conditional logistic regression
model of the size you're talking about.  Maybe someone else on the list with
more insight into the computational logistics can help you more here.

Joseph Coveney



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index