Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: xtgee results depend on sort order


From   wgould@stata.com (William Gould)
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: xtgee results depend on sort order
Date   Wed, 25 Oct 2006 11:10:05 -0500

Summary
-------

    Jonathan Sterne <Jonathan.Sterne@bristol.ac.uk> reported a problem in
    which he ran -xtgee, corr(exchangeable)- twice and received different
    answers.  We will discuss this issue in detail below.  In summary,
    this turned out to be a case where the model does not fit the data and
    convergence is not really possible.  In the next ado-file update, 
    -xtgee- will be modified to return r(430), convergence not achieved,
    in such cases.


Background:  Exchangeable correlation
-------------------------------------

    Jonathan's desired to fit a Gaussian family, identity link -xtgee- 
    model with exchangeable correlation, i.e.,

         y_it = X_it*b + u_it                         (1)

    where u_it has correlation matrix 

               +-                -+
               |  1  p  p  ...  p |
               |  p  1  p  ...  p |
               |  p  p  1  ...  p |                   (2)
               |  .               | 
               |  .               |
               |  p  p  p  ...  1 |
               +-                -+

     In the above, vector b and scalar p are to be estimated.

     This model generally arises in a random-effects context.  One assumes 

                u_it = v_i + e_it                     (3)

     where v_i are fixed constants within panel, and e_it are independent.

     The random-effects equation (3) implies a constant correlation matrix 
     (2), and in fact, p is

                            s2_v
                  p =   ------------
                         s2_v + s2_e

     where s2_v is the variance of v_i and s2_e is the variance of e_it.
     Thus, p will be positive.

     When p is positive, there is no problem.

     The -xtgee- model, that is, the model (1) and (2), is more general 
     than the random effects model in the p can be negative as well.

     The model Jonathan Sterne fitted had p<0.  

     Exchange models with p<0 can have substantive problems.

     Below, we consider such models more carefully.


Negative-correlation matrices
-----------------------------

     Correlation matrices must be positive definite or at least positive
     semidefinite.  That sounds arcane, but it has important implications.
     Consider the matrix

                +-               -+
                |    1  -.8  -.8  |
                |  -.8    1  -.8  |
                |  -.8  -.8    1  |
                +-               -+

     The above matrix is not positive definite; it's determinate is 
     -1.944.

     Because of that, it cannot be a correlation matrix.  Try all you 
     want, and you cannot generate data for three variables with the 
     above correlations!

     This should not surprise you.  Rather than -.8, let's use -1:

                +-               -+
                |    1   -1   -1  |
                |   -1    1   -1  |
                |   -1   -1    1  |
                +-               -+

     It is obvious you cannot create three variables correlated like 
     this.  If x1 and x2 are correlated -1, and x2 and x3 are correlated 
     -1, x1 and x3 just have to be correlated +1.  Said mathematically, 
     the above matrix is not positive definite.

     So it turns out p = -1 does not work and p = -.8 does not work.
     Do any negative numbers work?  Yes.  In the case of a 3 x 3 matrix, 
     p < -.5 produces a positive definite matrix, which is to say, 
     a valid correlation matrix.

     For 4x4 matrices, the cut off is -.333...   
     If p < - .333..., it is a valid correlation matrix, and if 
     p >= -.3333..., it is not.

     The formula for the cutoff is 

                max_neg_corr  =  -1 / (n-1)

     and that leads to the following table:

                             max
                    n     neg. corr.
           ---------------------------
                    2        -1
                    3         -.5
                    4         -.33333...
                    5         -.25
                   11         -.1
                  101         -.01
                1,001         -.001
            1,000,001         -.000001
           ---------------------------


Application to GEE model
------------------------

     Thus, if you have constant negative correlation,
     
               +-                -+
               |  1  p  p  ...  p |
               |  p  1  p  ...  p |
               |  p  p  1  ...  p | ,     p < 0
               |  .               | 
               |  .               |
               |  p  p  p  ...  1 |
               +-                -+

     that is, what the GEE model calls exchangeable, there is a limit as to
     how negative p can be, and that limit is a function of n, the number of
     observations within panels (which is equal to the number of rows and
     columns of the correlation matrix).

     Jonathan's model wanted to converge to a p that was too negative.
     That is what caused the problem, AND THAT HAD NOTHING TO DO 
     WITH SOFTWARE LIMITATIONS. It was a substantive issue about the 
     exchangeable model with negative correlation.  Jonathan's data wanted 
     to converge to an impossible value of p.

     It is worth appreciating this problem in general.

     If you have data generated by a an exchangeable correlation process 
     with p<0, there are limits on p.  If the maximum panel size in your 
     data is, say 11, then p must be greater than -.1.  If the maximum 
     panel is 11, but you know that in the population, larger panels exist,
     and that maximum theoretical size is 101, then p must be greater than
     -.01.  If you think the panel size is unlimited, then p must be greater
     than (or equal to) 0!

     All of which is to say, the exchangeable model makes little sense 
     when p<0 unless maximum possible panel sizes are fixed, say at 2, or 3,
     etc.


Software implications
---------------------

     It wasn't the software that caused Jonathan's problem, that problem
     was substantive, but the problem could have been handled more 
     gracefully.

     In the next ado-file update, -xtgee- is modified to warn the user
     in such cases.  If Jonathan were to try his model and data, he would 
     see

         . xtgee ...
         <iteration log>
         <estimation results>
         exchangeable working correlation matrix not positive definite
         convergence not achieved
         r(430);

    In addition, we have modified -xtgee- to reset p during iterations 
    to be just inside the minimum boundary implied by the observed maximum
    panel size, which should help models that can converge to converge.
    Jonathan's model, however, wants to converge to an invalid set of
    parameters and thus the exchangeable assumption is simply untenable.

    The problem of non positive definite correlation matrices can arise 
    with some other correlation structures as well.  They cannot arise 
    in the autoregressive or independent cases.  They can arise in the
    unstructured, stationary, and nonstationary cases.  The changes we 
    have made apply in those cases, too.

-- Bill                   -- Brian              -- Vince
   wgould@stata.com          bpoi@stata.com        vwiggins@stata.com

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index