Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Strange -robust- results with a singleton dummy


From   [email protected] (Vince Wiggins, StataCorp)
To   [email protected]
Subject   Re: st: Strange -robust- results with a singleton dummy
Date   Tue, 18 Oct 2005 14:31:50 -0500

David at <[email protected]> is estimating a model using -regress- with
the -cluster()- option to get cluster-robust standard errors.  When he
includes an indicator variable that takes on the value 1 for all observations
in a cluster and 0 elsewhere (a singleton indicator), the overall model
F-statistic goes missing.

David asks,

> Can anybody explain what's going on?

Kit Baum <[email protected]> notes that this may be a variation of the data
pathology discussed at the end of -help j_robustsingular-, the help file
displayed from the blue link when the F-statistics is missing.

Kit is absolutely correct, this is the same problem.  The help file will be
updated to reflect this and the explanation adapted to reflect this additional
case.

As Kit notes, the essential problem is that a component of the robust
covariance matrix is driven to 0 when you have a singleton indicator.  This
causes the covariance matrix to be of lower rank than the number of covariates
and therefore makes it impossible to compute an overall model F-statistic.
Notably, the rest of the results and tests are still fine.

In private discussions off list, I have been asked to explain this in more
detail.  I'm going to adapt this from my answer to a question from Mark
Schaffer <[email protected]> who encountered the problem using unclustered
robust and whose question led to the discussion in -help j_robustsingular-.

Mark asked several interesting questions, starting with,

> Does anybody have any ideas on how to interpret this?

Mechanically it is pretty easy to see what is happening for the standard (not
clustered) robust covariance estimator.  The robust covariance matrix is:

        V_robust = DGD

        where:
                D  is the negative inverse hessian (the most often used
                   estimate of the covariance matrix).

                G  is the outer product of the score (or gradient) vectors
                   for each observation, often called the OPG. (Also, a
                   perfectly valid estimate of the covariance matrix and
                   typically used when estimating by BHHH.)


        G = g'g

        where:
                       d(L_i)
                g_ik = ------
                       d(B_k)

        and:
                L_i  is the quasi-likelihood of the ith observation
                B_k  is the vector of coefficients

        So, g is a N by k vector where k is the number of parameters.

We have started from a quasi-likelihood using L_i, but we could have started
from the estimating equations (or normal equations) for OLS, it makes no
difference.

When we have an indicator variable that is 1 for a single observation and 0
everywhere else, the column vector g_k has a very distinctive pattern -- it is
all zeros.

              d(L_i)
        g_i = --------------  = 0  whenever the indicator is 0 
              d(B_indicator)       because B_indicator*0 is 0. 


              d(L_i)
        g_i = --------------  = 0  for the single observation 
              d(B_indicator)       where indicator=1 because 
                                   the moment conditions
                                   for maximizing the quasi-likelihood 
                                   are that the gradient for each 
                                   coefficient is 0.  Since all of the 
                                   other observations have
                                   the indicator set to 0, only this
                                   observation contributes to the gradient 
                                   and is is set to 0 by the moment 
                                   condition in choosing B_indicator.

                                   Put another way, the scores for a
                                   coefficient (g_i) must sum to 0 and since
                                   the score is 0 when the variable is 0 and
                                   since variable is non-zero for only one
                                   observations, the score for that
                                   observation must also be 0.

All of this means that the column of g corresponding to the indicator variable
is all 0 and thus G = g'g is not full rank, and thus V_robust=DGD is also not
full rank.

That means Stata cannot compute an overall model F-statistic because the rank
of the covariance matrix is not sufficient to test the hypothesis that all of
the coefficients are simultaneously 0.

For the cluster-robust estimator, we need only two more pieces of information.
First, G in the cluster-robust VCE is not g'g, but f'f where f is an M by k
matrix and each row of f represents one of the M clusters and is just the sum
of g_i over the cluster.  Second, as noted earlier, the sum of the scores for
a variable is 0 and for an indicator variable the scores are only non-zero
where that indicator is 1.  Thus, the sum of the scores for an indicator over
the observations where the indicator is 1 is 0.  That puts us back where we
were with the simple robust VCE, whenever an indicator is one for all
observations of a cluster and 0 everywhere else, its column in f will be all 0
and f'f will not be full rank.


Mark went on to ask,

> Are the robust SEs usable anyway?

Yes.  

We wrote everything in matrix notation because it is easier and because it
clearly shows why the covariance matrix is not full rank.  If, however, we
simply wrote out the formula for the SE of a single parameter (let's not) we
would see that it can be evaluated and is just a sum of specific element-wise
products from the elements of D and G.  That G is not full rank does not cause
us any problems in computing the SE for a single coefficient.

Intuitively, the lack of information from the gradients of the singleton
indicator variable does not cause a problem even when estimating the robust SE
for the indicator variable itself.  The gradients from the remaining
coefficients are leveraged to form that estimate.  It is not much different
from our ability to estimate a standard (non-robust) SE for the indicator even
though all of the information content of the single positive observation went
into the parameter estimate.


> Is the robust var-cov matrix still usable?

Mostly.

We can use the covariance matrix to test any subset of joint hypotheses that
does not exceed its rank.

-- Vince
   [email protected]


Anyone who has read this far deserves a break, so I keep the original
postscript.

P.S. A little (unrelated) story
     --------------------------

You might wonder why I always use the word "indicator" to describe binary
regressors, even when the original poster called them "dummy" variables.  I
was once briefing a group of mainly military personal about the implications
of some model.  About the third time I referred to the "colonel dummy"
everyone at the table broke out in laughter, everyone that is except the older
gentleman at the head of the table with lots of bars on his label and little
bird emblems on his shoulders.

I long ago forgot the subject of the talk, but I never forgot the lesson.

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index