Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Francesco <cariboupad@gmx.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: SAS vs STATA : why is xtlogit SO slow ? |

Date |
Sat, 11 Feb 2012 00:39:07 +0100 |

Dear all, Eureka! After having navigated through many statistical helplists over the internet I think that we are about to understand what did not work well with my dataset. clogit seems not to behave well when some individuals show a very (very very) high number of points. This idea was actually suggested to me in a slighly different context and for another stat package (R) when I had some problems to estimate a fixed effect linear model with my dataset and I asked for some help on the R helplist I asked the Stata support and they kindly replied very quickly but, as we can imagine, they needed my database in order to verify what went eventually wrong during the computation. I could not send the database however, so I had to find out another solution... The magic trick is that The hypothesis that clogit does not converge well when used with a dataset that includes individuals with a very unusual and large number of points could indeed be tested using fake data. That is exactly what I tried to do. I used a very simple code in order to create a fake panel dataset that truly reflects a fixed effects logit structure : P(Yit=1)=F(Ai+4*Zit) where F is the logistic function, Ai is an individual fixed effect and Y and X are dummy variables that vary over i and t. So now we know exactly what is the true value of the parameter that we want to estimate: 4 Here is the code (I was inspired by the code used here https://en.wikibooks.org/wiki/Stata/Linear_Models#Dynamic_Linear_Panel_Data) ___________________________________________________________________________ clear all set obs 1000 set seed 123456 gen id = _n gen large= _n>950 gen constant= invnorm(uniform()) *here we create normal individuals with a common number of points forvalues t=1/10{ gen dum`t' = rbinomial(1, 0.3) if large==0 gen y`t' = rbinomial(1, invlogit(constant+4*dum`t')) if large==0 } * ... and here we have our highly active friends! forvalues t=11/5000{ gen dum`t' = rbinomial(1, 0.8) if large==1 gen y`t' = rbinomial(1, invlogit(constant+4*dum`t')) if large==1 } reshape long y dum, i(id) j(year) drop if missing(dum) xtset id xtlogit y dum, fe __________________________________________________________________________ as guessed, Stata tells me rather quickly that : note: multiple positive outcomes within groups encountered. note: 50 groups (500 obs) dropped because of all positive or all negative outcomes. Iteration 0: log likelihood = -1.#INF Iteration 1: log likelihood = -1.#IND Hessian is not negative semidefinite r(430); Let's see what SAS 9.3 says when I use the same dataset transferred with Stattransfer : PROC LOGISTIC DATA=mydata; MODEL y (EVENT='1') = dum; STRATA id; RUN; Variable de réponse y Nombre de niveaux de réponse 2 Nombre de niveaux de discrétisation 1000 Nombre de niveaux de discrétisation sans informations 50 Fréquence sans informations 500 Modèle logit binaire Technique d'optimisation Ridge de Newton-Raphson Convergence criterion (GCONV=1E-8) satisfied. Statistiques d'ajustement du modèle Critère Sans covariables Avec covariables AIC 183458.81 111608.55 SC 183458.81 111619.01 -2 Log 183458.81 111606.55 Test de l'hypothèse nulle globale : BETA=0 Test Khi-2 DDL Pr > Khi-2 Rapport de vrais 71852.2579 1 <.0001 Score 83945.0829 1 <.0001 Wald 44690.4549 1 <.0001 Estimations par l'analyse du maximum de vraisemblance Paramètre DDL Valeur estimée Erreur type Khi-2 de Wald Pr > Khi-2 dum 1 3.9961 0.0189 44690.4549 <.0001 3.9961!! The right value! What do you think ? Are you convinced ? I might just say that I am not telling that one package is better than the other : they have all their strenghts and weaknesses. I confess also that I personally prefer to use Stata, so if my hypothesis is right and it is possible to fix this rare problem, I will be very very glad. I wait for your eventual comments before sending the code and explanation to the support team Many thanks again Best, * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

**References**:**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Clyde B Schechter <clyde.schechter@einstein.yu.edu>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Klaus Pforr <kpforr@googlemail.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*Klaus Pforr <kpforr@googlemail.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*k7br@gmx.fr

**Re: st: SAS vs STATA : why is xtlogit SO slow ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

- Prev by Date:
**st: how to construct change in earnings since last election** - Next by Date:
**st: stset multiple-spell(recurrent) delayed entry** - Previous by thread:
**Re: st: SAS vs STATA : why is xtlogit SO slow ?** - Next by thread:
**Re: st: SAS vs STATA : why is xtlogit SO slow ?** - Index(es):