Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Joseph Coveney" <jcoveney@bigplanet.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: SAS vs STATA : why is xtlogit SO slow ? |

Date |
Thu, 2 Feb 2012 21:03:26 +0900 |

Francesco wrote: I have a very large panel dataset (about 7mo observations, 70 000 individuals, 50 points on average per individual) and I tried desperately to estimate a fixed effect logit using : xtlogit, fe with Stata... Unfortunately the log likelihood never converge and stops at iteration 1 or 2 saying that the matrix is not semi definite positive... (Iteration 0: log likelihood = -1.#INF , etc) However when I try with SAS (using proc Logistic with the strata option) the computation is quite long (more than one hour) but at least I can see the results... My question is : how is that possible ? I believed that storing the database into the RAM (as STATA do) would speed computations... and I would love to use xtlogit (or clogit) instead of using SAS... any idea ? -------------------------------------------------------------------------------- A couple of suggestions, if you haven't considered them already: 1. It seems as if speed isn't really the problem--it's a symptom of the major problem: PROC LOGISTIC is telling you a hunky-dory convergence story and Stata's notifying you that your model isn't legit after a single iteration or two. Are you sure that you're fitting the same model in SAS and Stata? (You didn't show any code, and so it's impossible to tell from what you posted on the list.) Try taking the first 200 or 250 individuals' data and running just those in your SAS and Stata models (with just 200-250 panels, both SAS and Stata should converge relatively quickly if they're ever going to). Do the log-likelihood values, regression coefficients and their standard errors look the same? (Ignore flipping of coefficient sign--PROC LOGISTIC reverses the order of the response variable, or at least it used to.) 2. I don't understand your limited description of the dataset (70 000 × 50 != 7e6), but if you have only categorical predictors, then try using -contract <response predictors id_variable>, freq(weight)- on your *wide* dataset, then -reshape long- the result, and then finally use -clogit <response predictors> [fweight=weight], group(<id_variable>)-. As far as I know, SAS's procedures are in compiled programming languages (as I recall, it's been predominantly, if not exclusively, C since the Big Rewrite). Stata has some of the code for -xtlogit- in interpreted ado-files, but I suspect that the critical-path stuff is in compiled C, too. Run times in SAS are more I/O dependent, as you note, but a few-hundred-variable-wide, seven-million observation dataset isn't going to be any kind of problem for either SAS or Stata. So, I cannot think of a reason why there should be a major difference in run times between the two packages for the same conditional logistic regression model of the size you're talking about. Maybe someone else on the list with more insight into the computational logistics can help you more here. Joseph Coveney * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

- Prev by Date:
**st: RE: One v. two-step ECMs** - Next by Date:
**Re: st: string date** - Previous by thread:
**Re: st: SAS vs STATA : why is xtlogit SO slow ?** - Next by thread:
**Re: st: SAS vs STATA : why is xtlogit SO slow ?** - Index(es):