[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: SAS vs STATA : why is xtlogit SO slow ?
Re: st: SAS vs STATA : why is xtlogit SO slow ?
Sun, 5 Feb 2012 20:06:08 +0100
Dear Klaus, Joseph,
Many thanks again for your time and help. I use the latest version of
Stata 12 for 64 systems.
I will try to be as precise as possible on the data.
I would like to estimate a fixed effect logit.
The dataset is a panel dataset. I cannot, however, use the time
specification of the panel dataset in Stata because I could have
several observations at the same date for some individuals. In essence
I I could only define the panel nature of my dataset by using the
- xtset ID
The dataset contains about 6.5 million observations.
The output variable Y, is a dummy variable (0/1) which takes value 0
about 30% of the time (no missing values)
I have 3 predictors : one categorical dummy variable DUM and two
continuous CONT1 and CONT2
If I use
- xttab DUM
I obtain the that the within percentage is 97% for DUM=0 and 50% for DUM=0.
Almost every individual in the database has DUM=0 at least one time,
and only 5% approximately of all the individuals show DUM=1 at least
The continuous variable CONT1 is always positive. CONT2 could be
negative also. Its range is [-1.5, +1.5] approximately.
The mean number of points is 70. However the median is quite low at 10
points per individual.
The minimum number of points is 1, and the maximum if over 20000.
I tried to exclude individuals with less than 10 points, for example,
but the iteration does not converge either.
Here is the simple code I use in Stata in order to get my fixed effect logit
- xtlogit Y DUM CONT1 CONT2, fe
I also tried
- xtlogit Y DUM CONT1 CONT2, fe from(Dum=* /CONT1=** / CONT2=***)
where * ** *** are the results from the SAS optimization process. Even
in that case the convergence is not attained.
What kind of informations could I provide for you to help me ? I
cannot send the database unfortunately...
Dear Joseph dont get me wrong : I know that Stata is an excellent
software and I am very glad to use such a powerful statistical tool.
Probably there is some strange pattern in my data that makes the
convergence very difficult ? How can I find out?
However SAS 9.3, I dont know why, converged when I use
PROC LOGISTIC DATA=mydata ;
MODEL Y (EVENT='1') = DUM CONT1 CONT2;
I obtain :
Newton-Raphson Ridge Optimization
Without Parameter Scaling
The statistics for the Fit of the model :
criterion / Without covariates/ With Covariates
AIC 3930972 3927734
SC 3930972 3927775
-2 Log 3930972 3927728
The Wald, Score and likelihood ratio are <.0001
On 5 February 2012 15:37, Klaus Pforr <firstname.lastname@example.org> wrote:
> Am 04.02.2012 13:33, schrieb email@example.com:
>> Hello everyone,
>> Sorry for the delay.. I had to try your very interesting suggestions
>> before anything else...
>> Richard, Clyde, thank you for your interesting comments but the option
>> from doesnt help... Stata cannot converge :
>> Iteration 0: log likelihood = -1.#INF
>> Iteration 1: log likelihood = -1.#IND
>> Hessian is not negative semidefinite
>> Klaus, indeed I try to estimate a Fixed effect logit, not a random
>> effect. However are you sure that Stata uses the pooled coefficients
>> from the plain logit estimation?
>> Indeed if I send the Stata command : logit Y DUM CONT, the computation
>> takes a few seconds only to converge, but the results are quite
>> different from the logit fixed effect SAS estimation... One parameter
>> has the opposite sign for example which probaly means that including
>> dummies by individual is important.. ;-)
> xtlogit.ado with fe-option refers to clogit.ado. You find this in lines
> 208-248 (in version 2.12.3 11may2010). In line 246 you find the actual
> reference to clogit. In the clogit.ado (version 1.6.15 15jul2011) you find
> the management of the starting values in lines 269-304. Depending on the
> from-options and other stuff, the default is a binary logit (you see it in
> line 281) to get the starting values.
> But dont get me wrong. Don't use the logit to estimate your results, when
> you have reasons to estimate a fixed effects model.
> Another thing, that raises doubts for me is your mentioning of inclusion of
> dummies by individual. You cannot use this approach for any ml-estimated
> fixed effects models because of the incidental parameters problem. The
> conditions for the consistency of the ml estimators are not met, if the
> coeffiecent vector depends on N, which it does, as you have a constant for
> almost any case. That is why you use the complicated conditioanl logit
> approach in the first place. Please give us also the command line, that you
> gave to Stata, and maybe a glimpse on your data.
>> By the way I have checked that there is indeed enough variation in the
>> DUM categorical variable so I do not think the problems are coming
>> from the variables...
>> MORE IMPORTANTLY : when I compare SAS's results with STATA on a MUCH
>> (really much) smaller sample (less than 2000 observations, 146
>> individuals, 11 points on average per individual) then the results are
>> exactly the same between the two systems (same point values + standars
>> errors+ P-values)... thus suggesting that something bad is going on
>> when STATA try to fit the fixed effect logit model on a larger dataset
>> So I am puzzled ...
>> What do you think ?
>> Thanks again for your help
> Klaus Pforr
> MZES AB - A
> Universität Mannheim
> D - 68131 Mannheim
> Tel: +49-621-181 2797
> fax: +49-621-181 2803
> URL: http://www.mzes.uni-mannheim.de
> Besucheranschrift: A5, Raum A309
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: