[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Strange -robust- results with a singleton dummy |

Date |
Tue, 18 Oct 2005 14:31:50 -0500 |

David at <eurasian29@hotmail.com> is estimating a model using -regress- with the -cluster()- option to get cluster-robust standard errors. When he includes an indicator variable that takes on the value 1 for all observations in a cluster and 0 elsewhere (a singleton indicator), the overall model F-statistic goes missing. David asks, > Can anybody explain what's going on? Kit Baum <kitbaum@mac.com> notes that this may be a variation of the data pathology discussed at the end of -help j_robustsingular-, the help file displayed from the blue link when the F-statistics is missing. Kit is absolutely correct, this is the same problem. The help file will be updated to reflect this and the explanation adapted to reflect this additional case. As Kit notes, the essential problem is that a component of the robust covariance matrix is driven to 0 when you have a singleton indicator. This causes the covariance matrix to be of lower rank than the number of covariates and therefore makes it impossible to compute an overall model F-statistic. Notably, the rest of the results and tests are still fine. In private discussions off list, I have been asked to explain this in more detail. I'm going to adapt this from my answer to a question from Mark Schaffer <M.E.Schaffer@hw.ac.uk> who encountered the problem using unclustered robust and whose question led to the discussion in -help j_robustsingular-. Mark asked several interesting questions, starting with, > Does anybody have any ideas on how to interpret this? Mechanically it is pretty easy to see what is happening for the standard (not clustered) robust covariance estimator. The robust covariance matrix is: V_robust = DGD where: D is the negative inverse hessian (the most often used estimate of the covariance matrix). G is the outer product of the score (or gradient) vectors for each observation, often called the OPG. (Also, a perfectly valid estimate of the covariance matrix and typically used when estimating by BHHH.) G = g'g where: d(L_i) g_ik = ------ d(B_k) and: L_i is the quasi-likelihood of the ith observation B_k is the vector of coefficients So, g is a N by k vector where k is the number of parameters. We have started from a quasi-likelihood using L_i, but we could have started from the estimating equations (or normal equations) for OLS, it makes no difference. When we have an indicator variable that is 1 for a single observation and 0 everywhere else, the column vector g_k has a very distinctive pattern -- it is all zeros. d(L_i) g_i = -------------- = 0 whenever the indicator is 0 d(B_indicator) because B_indicator*0 is 0. d(L_i) g_i = -------------- = 0 for the single observation d(B_indicator) where indicator=1 because the moment conditions for maximizing the quasi-likelihood are that the gradient for each coefficient is 0. Since all of the other observations have the indicator set to 0, only this observation contributes to the gradient and is is set to 0 by the moment condition in choosing B_indicator. Put another way, the scores for a coefficient (g_i) must sum to 0 and since the score is 0 when the variable is 0 and since variable is non-zero for only one observations, the score for that observation must also be 0. All of this means that the column of g corresponding to the indicator variable is all 0 and thus G = g'g is not full rank, and thus V_robust=DGD is also not full rank. That means Stata cannot compute an overall model F-statistic because the rank of the covariance matrix is not sufficient to test the hypothesis that all of the coefficients are simultaneously 0. For the cluster-robust estimator, we need only two more pieces of information. First, G in the cluster-robust VCE is not g'g, but f'f where f is an M by k matrix and each row of f represents one of the M clusters and is just the sum of g_i over the cluster. Second, as noted earlier, the sum of the scores for a variable is 0 and for an indicator variable the scores are only non-zero where that indicator is 1. Thus, the sum of the scores for an indicator over the observations where the indicator is 1 is 0. That puts us back where we were with the simple robust VCE, whenever an indicator is one for all observations of a cluster and 0 everywhere else, its column in f will be all 0 and f'f will not be full rank. Mark went on to ask, > Are the robust SEs usable anyway? Yes. We wrote everything in matrix notation because it is easier and because it clearly shows why the covariance matrix is not full rank. If, however, we simply wrote out the formula for the SE of a single parameter (let's not) we would see that it can be evaluated and is just a sum of specific element-wise products from the elements of D and G. That G is not full rank does not cause us any problems in computing the SE for a single coefficient. Intuitively, the lack of information from the gradients of the singleton indicator variable does not cause a problem even when estimating the robust SE for the indicator variable itself. The gradients from the remaining coefficients are leveraged to form that estimate. It is not much different from our ability to estimate a standard (non-robust) SE for the indicator even though all of the information content of the single positive observation went into the parameter estimate. > Is the robust var-cov matrix still usable? Mostly. We can use the covariance matrix to test any subset of joint hypotheses that does not exceed its rank. -- Vince vwiggins@stata.com Anyone who has read this far deserves a break, so I keep the original postscript. P.S. A little (unrelated) story -------------------------- You might wonder why I always use the word "indicator" to describe binary regressors, even when the original poster called them "dummy" variables. I was once briefing a group of mainly military personal about the implications of some model. About the third time I referred to the "colonel dummy" everyone at the table broke out in laughter, everyone that is except the older gentleman at the head of the table with lots of bars on his label and little bird emblems on his shoulders. I long ago forgot the subject of the talk, but I never forgot the lesson. <end> * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Strange -robust- results with a singleton dummy (continued)***From:*"Alexander Nervedi" <alexnerdy@hotmail.com>

- Prev by Date:
**Re: st: SE with cluster option** - Next by Date:
**st: Intraclass correlation** - Previous by thread:
**st: Installing symbolstyles** - Next by thread:
**Re: st: Strange -robust- results with a singleton dummy (continued)** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |