k7br@gmx.fr |

statalist@hsphsun2.harvard.edu |

Re: st: SAS vs STATA : why is xtlogit SO slow ? |

Sat, 4 Feb 2012 13:33:40 +0100 |

Hello everyone, Sorry for the delay.. I had to try your very interesting suggestions before anything else... ********************************* 1) Richard, Clyde, thank you for your interesting comments but the option from doesnt help... Stata cannot converge : Iteration 0: log likelihood = -1.#INF Iteration 1: log likelihood = -1.#IND Hessian is not negative semidefinite ********************** 2) Klaus, indeed I try to estimate a Fixed effect logit, not a random effect. However are you sure that Stata uses the pooled coefficients from the plain logit estimation? Indeed if I send the Stata command : logit Y DUM CONT, the computation takes a few seconds only to converge, but the results are quite different from the logit fixed effect SAS estimation... One parameter has the opposite sign for example which probaly means that including dummies by individual is important.. ;-) By the way I have checked that there is indeed enough variation in the DUM categorical variable so I do not think the problems are coming from the variables... ******************** MORE IMPORTANTLY : when I compare SAS's results with STATA on a MUCH (really much) smaller sample (less than 2000 observations, 146 individuals, 11 points on average per individual) then the results are exactly the same between the two systems (same point values + standars errors+ P-values)... thus suggesting that something bad is going on when STATA try to fit the fixed effect logit model on a larger dataset ... So I am puzzled ... What do you think ? Thanks again for your help On 3 February 2012 18:14, Klaus Pforr <kpforr@googlemail.com> wrote: > <> > > just some comments on this, although I hope that the person who posted this > problem originally will eventually tell us more about the data and the > output. > > Am 03.02.2012 17:34, schrieb Clyde B Schechter: > >> I don't really know much about how xtlogit (or any of the other xt >> estimators) work "under the hood" [that's "under the bonnet" to Nick Cox] >> but I have used these estimators a fair amount and have some pragmatic tips >> for dealing with non-convergence of random effects models that have served >> me well. > > I think that he/she wants to estimate a fixed-effects-model (although I'm > sure, if this is generally easier or more difficult to estimate than RE) > >> >> 1. Check all of your categorical predictors. If any of them have any >> level that is only instantiated in a small number of cases in the estimation >> sample, the coefficient for that level can be very difficult to estimate. >> Try combining some levels in that variable (or, if it is a dichotomous >> variable drop it from the model.) >> >> 2. Similarly check your continuous variables to be sure the have some >> reasonable amount of variability in the estimation sample. >> >> 3. Check the scales of your continuous variables to see that they are all >> in the same "ballpark." If two variables differ by several orders of >> magnitude, Stata will often thrash around trying to fit coefficients and >> ultimately fail. >> >> 4. Try providing Stata with starting values of your own using the from() >> option. Other responders have already suggested this. I have a couple of >> specific suggestions for selecting starting values: >> >> a. Try the non-xt version of the same model, in this case logit. See if >> those values will get Stata over the hump. >> b. Try the population averaged version of the same model. The population >> averaged estimator is calcualted using a different approach that seems to be >> more robust to quirks in the data, and those estimates often work well as >> starting values for the random effects model. [Which surprises me, because >> the population averaged parameters are actually different conceptually and >> often distant numerically from the corresponding parameters of a random >> effects model. But my experience is that they almost always work as a >> starting point nonetheless.] > > Atleast for FE, the implemented estimator uses the pooled coefficients of > the logit-model by default. Annother possibility are random starting values, either by turning on the search option with the ml-options, or computing them before and passing them via the from()-option.

Hope this helps.

Clyde Schechter
Department of Family& Social Medicine

Albert Einstein College of Medicine
Bronx, New York, USA


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

best

Klaus

