Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: SAS vs STATA : why is xtlogit SO slow ?


From   Francesco <cariboupad@gmx.fr>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: SAS vs STATA : why is xtlogit SO slow ?
Date   Sat, 11 Feb 2012 00:39:07 +0100

Dear all,

Eureka!

After having navigated through many statistical helplists over the
internet I think that we are about to understand what did not work
well with my dataset.
clogit seems not to behave well when some individuals show a very
(very very) high number of points. This idea was actually suggested to
me in a slighly different context and for another stat package (R)
when I had some problems to estimate a fixed effect linear model with
my dataset and I asked for some help on the R helplist

I asked the Stata support and they kindly replied very quickly but, as
we can imagine, they needed my database in order to verify what went
eventually wrong during the computation.
I could not send the database however, so I had to find out another solution...

The magic trick is that The hypothesis that clogit does not converge
well when used with a dataset that includes individuals with a very
unusual and large number of points could indeed be tested using fake
data.

That is exactly what I tried to do.
I used a very simple code in order to create a fake panel dataset that
truly reflects a fixed effects logit structure :

P(Yit=1)=F(Ai+4*Zit)

where F is the logistic function, Ai is an individual fixed effect and
Y and X are dummy variables that vary over i and t.
So now we know exactly what is the true value of the parameter that we
want to estimate: 4

Here is the code (I was inspired by the code used here
https://en.wikibooks.org/wiki/Stata/Linear_Models#Dynamic_Linear_Panel_Data)
___________________________________________________________________________
clear all
       set obs 1000
       set seed 123456
       gen id = _n
               gen large= _n>950
       gen constant= invnorm(uniform())
*here we create normal individuals with a common number of points
       forvalues t=1/10{
                               gen dum`t' =  rbinomial(1, 0.3) if large==0
                               gen y`t' =  rbinomial(1,
invlogit(constant+4*dum`t')) if large==0
               }

* ... and here we have our highly active friends!
            forvalues t=11/5000{
                               gen dum`t' =  rbinomial(1, 0.8) if large==1
                               gen y`t' =  rbinomial(1,
invlogit(constant+4*dum`t')) if large==1
               }



reshape long y dum, i(id) j(year)
drop if missing(dum)
xtset id
xtlogit y dum, fe
__________________________________________________________________________

as guessed, Stata tells me rather quickly that :


note: multiple positive outcomes within groups encountered.
note: 50 groups (500 obs) dropped because of all positive or
     all negative outcomes.

Iteration 0:   log likelihood =    -1.#INF
Iteration 1:   log likelihood =    -1.#IND
Hessian is not negative semidefinite
r(430);


Let's see what SAS 9.3 says when I use the same dataset transferred
with Stattransfer :

PROC LOGISTIC DATA=mydata;
MODEL y (EVENT='1') = dum;
STRATA id;
RUN;

Variable de réponse y
Nombre de niveaux de réponse 2
Nombre de niveaux de discrétisation 1000
Nombre de niveaux de discrétisation sans informations 50
Fréquence sans informations 500
Modèle logit binaire
Technique d'optimisation Ridge de Newton-Raphson


Convergence criterion (GCONV=1E-8) satisfied.

Statistiques d'ajustement du modèle
Critère Sans
covariables Avec
covariables
AIC 183458.81 111608.55
SC 183458.81 111619.01
-2 Log 183458.81 111606.55

Test de l'hypothèse nulle globale : BETA=0
Test Khi-2 DDL Pr > Khi-2
Rapport de vrais 71852.2579 1 <.0001
Score 83945.0829 1 <.0001
Wald 44690.4549 1 <.0001

Estimations par l'analyse du maximum de vraisemblance
Paramètre DDL Valeur estimée Erreur
type Khi-2
de Wald Pr > Khi-2
dum 1    3.9961     0.0189 44690.4549 <.0001

3.9961!!
The right value!

What do you think ? Are you convinced ?

I might just say that I am not telling that one package is better than
the other : they have all their strenghts and weaknesses.
I confess also that I personally prefer to use Stata, so if my
hypothesis is right and it is possible to fix this rare problem, I
will be very very glad.

I wait for your eventual comments before sending the code and
explanation to the support team

Many thanks again
Best,

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index