Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "Can Your Results be Replicated?" (Stata error?)


From   [email protected] (Jeff Pitblado, StataCorp LP)
To   [email protected]
Subject   Re: st: "Can Your Results be Replicated?" (Stata error?)
Date   Fri, 13 Sep 2013 15:12:04 -0500

This post is rather long.  For those interested in cutting to the action items
we will take as a result of this thread, please scroll to the end.

Philip Jones <[email protected]> writes about a Twitter feed he
received today:

> I found a link on my Twitter feed this AM, purporting to show how
> Stata "made a mistake" that R did not make:
> 
> http://www.r-bloggers.com/can-your-results-be-replicated/
> 
> which actually points to:
> 
> http://politicalsciencereplication.wordpress.com/2013/09/11/guest-blog-how-to-persuade-journals-to-accept-your-replication-paper/
> 
> I realize that "r-bloggers" is likely not the most bias-free site when
> it comes to reviewing/rating stats packages, but has anyone got an
> idea as to what is actually going on here? Is Stata really at fault?

Joerg Luedicke <[email protected]> looked at the paper:

> After having a quick glance at their paper
> (http://jcr.sagepub.com/content/early/2013/08/19/0022002713499718.abstract?papetoc)
> it seems that they are talking about a problem with Stata's -xtgee-
> command which, in the case of separation in a logit model, provides
> nonsense results as opposed to omitting predictors or the like. Below
> is a toy example showing what seems to be the problem. However,
> finding an effect of something like "x is 3 million times less likely
> than y" and not getting suspicious rather looks like sloppy research
> to me in the first place.

Anders Alexandersson <[email protected]> followed up with one of the
authors and also found out the "mistake" was related to using the -xtgee-
command in fitting a logit model with a separation problem (perfect
predictors).

Joerg's example simulates a logit model with random effects, but ensures that
the simulated data perfectly predicts a zero response for one level of a
factor variable covariate.  Here is Joerg's example:

*------------------------
clear
set obs 100
set seed 123

gen id = _n
gen ui = rnormal(0,0.5)

expand 10
bys id : gen year = _n
gen x = cond(mod(_n-1, 3) == 1, 1, cond(mod(_n-1, 3) == 0, 2, 3))
tab x, gen(x_)

gen xb = 1 / (1 + exp(-(0.3*x_2 + 0.3*x_3 + ui)))
gen y = rbinomial(1,xb)
replace y = 0 if x_1 == 1
tab y x

xtset id year
xtgee y i.x, fam(binomial) link(logit)
melogit y i.x || id:
logit y i.x
*------------------------

Here are some comments:

1. -logit- has some code that specifically checks for perfect predictors.
   When a perfect predictor is found, the associated covariate is typically
   omitted or a range of observations are marked out of the estimation sample
   (usually both actions are taken).

   -logit-'s -asis- option turns off this feature.  If there are prefect
   predictors, -logit- with the -asis- option usually fails to converge
   because the model is not identified without the above actions.

2. -melogit- assumes -asis- by default.  There is a NOT documented -noasis-
   option that will cause -melogit- to use the perfect predictor code
   described in the previous comment.  Here is the result of the model fit
   using Joerg's simulated data:

***** BEGIN:
. melogit y i.x, noasis || id:
note: 1.x != 0 predicts failure perfectly
      1.x dropped and 333 obs not used

note: 3.x omitted because of collinearity

Fitting fixed-effects model:

Iteration 0:   log likelihood = -450.55022  
Iteration 1:   log likelihood = -450.08693  
Iteration 2:   log likelihood = -450.08688  
Iteration 3:   log likelihood = -450.08688  

Refining starting values:

Grid node 0:   log likelihood = -458.42515

Fitting full model:

Iteration 0:   log likelihood = -458.42515  (not concave)
Iteration 1:   log likelihood = -450.03937  
Iteration 2:   log likelihood = -449.71284  
Iteration 3:   log likelihood = -449.35817  
Iteration 4:   log likelihood = -449.35741  
Iteration 5:   log likelihood = -449.35741  

Mixed-effects logistic regression               Number of obs      =       667
Group variable:              id                 Number of groups   =       100

                                                Obs per group: min =         6
                                                               avg =       6.7
                                                               max =         7

Integration method: mvaghermite                 Integration points =         7

                                                Wald chi2(1)       =      1.74
Log likelihood = -449.35741                     Prob > chi2        =    0.1875
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
          1  |          0  (empty)
          2  |    .211456    .160449     1.32   0.188    -.1030183    .5259302
          3  |          0  (omitted)
             |
       _cons |   .2773084   .1179143     2.35   0.019     .0462006    .5084161
-------------+----------------------------------------------------------------
id           |
   var(_cons)|   .1253563   .1189478                      .0195191     .805068
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) =     1.46 Prob>=chibar2 = 0.1135
***** END:

   Notice that 1/3 of the data are dropped from the estimation sample because
   x==1 is a perfect predictor for y==0.
   
3. Joerg simulated the data using a zero for the intercept.  In order to
   recover the intended model parameters, we can refit using 'bn.x' and the
   -noconstant- option.  'bn.x' specifies that 'x' is a factor variable, but
   prevents Stata from identifying a base level.

	. melogit y bn.x, noasis noconstant || id:

   This will yield an equivalent model fit to the one above, except the
   coeffienct on 3.x will be .2773084 and the intercept constrained to zero.

4. -xtgee- does not currently have any logic for dealing with perfect
   predictors for the -logit- model.  For that matter, neither does -xtlogit,
   re-.

5. All this perfect predictor stuff for the -logit- model also applies to the
   -probit- model.

Action items:

1. We will re-evaluate our choice of NOT documenting the -noasis- option in
   -melogit- and -meprobit-.  This may result in us changing their default
   behavior (under version control) to match -logit- and -probit-, with a
   newly documented -asis- option.

2. We will work on getting logic for determining perfect predictors into

   * -xtgee- for -logit- models, also known as -xtlogit, pa-
   * -xtgee- for -probit- models, also known as -xtprobit, pa-
   * -xtlogit, re-
   * -xtprobit, re-

All of these actions will be provided in a future update to Stata 13.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index