[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Strange -robust- results with a singleton dummy |

Date |
Mon, 30 Jun 2003 16:25:54 -0500 |

Mark Schaffer <M.E.Schaffer@hw.ac.uk> is estimating a model with an indicator (dummy) variable that is 1 in only a single observation and 0 everywhere else and he wants an explanation for some things he notices about the variance-covariance matrix, > I've encountered (via David Stromberg) a peculiar feature of > regression with heteroskedastic-robust SEs when using dummy > variables. > > If a dummy variable takes the value of 1 for a single observation, > and zeros for the rest, some strange things happen: > > 1. The robust SEs still look quite plausible. > > 2. The F-stat is reported as missing. There is a hyperlink for the > missing F-stat in the regression output (Stata v7) but it doesn't > mention the singleton dummy as a possible explanation. > > 3. The robust var-cov matrix is not of full rank. Invert it and one > of the row/columns becomes all zeros (but not necessarily the one > corresponding to the singleton dummy). Mark then goes on to ask 3 questions. The questions help tell the story, so let's take them in order. > Does anybody have any ideas on how to interpret this? Mechanically it is pretty easy to see what is happening. The robust covariance matrix is: V_robust = DGD where: D is the negative inverse hessian (the most often used estimate of the covariance matrix). G is the outer product of the score (or gradient) vectors for each observation, often called the OPG. (Also, a perfectly valid estimate of the covariance matrix and typically used when estimating by BHHH.) G = g'g where: d(L_i) g_ik = ------ d(B_k) and: L_i is the quasi-likelihood of the ith observation B_k is the vector of coefficients So, g is a N by k vector where k is the number of parameters. We have started from a quasi-likelihood using L_i, but we could have started from the estimating equations (or normal equations) for OLS, it makes no difference. When we have an indicator variable that is 1 for a single observation and 0 everywhere else, the column vector g_k has a very distinctive pattern -- it is all zeros. d(L_i) g_i = -------------- = 0 whenever the indicator is 0 d(B_indicator) because B_indicator*0 is 0. d(L_i) g_i = -------------- = 0 for the single observation where d(B_indicator) indicator=1 because the moment conditions for maximizing the quasi-likelihood are that the gradient for each coefficient is 0. Since all of the other observations have the indicator set to 0, only this observation contributes to the gradient and is is set to 0 by the moment condition in choosing B_indicator. Put another way, the scores for a coefficient (g_i) must sum to 0 and since the score is 0 when the variable is 0 and since variable is non-zero for only one observations, the score for that observation must also be 0. All of this means that the column of g corresponding to the indicator variable is all 0 and thus G = g'g is not full rank, and thus V_robust=DGD is also not full rank. That means Stata cannot compute an overall model F-statistic because the rank of the covariance matrix is not sufficient to test the hypothesis that all of the coefficients are simultaneously 0. This is what Mark noticed in his items (2) and (3). Mark's second question was, > Are the robust SEs usable anyway? Yes. We wrote everything in matrix notation because it is easier and because it clearly shows why the covariance matrix is not full rank. If, however, we simply wrote out the formula for the SE of a single parameter (let's not) we would see that it can be evaluated and is just a sum of specific element-wise products from the elements of D and G. That G is not full rank does not cause us any problems in computing the SE for a single coefficient. Intuitively, the lack of information from the gradients of the singleton indicator variable do not cause a problem even when estimating the robust SE for the indicator variable itself. The gradients from the remaining coefficients are leveraged to form that estimate. It is not much different from our ability to estimate a standard (non-robust) SE for the indicator even though all of the information content of the single positive observation went into the parameter estimate. > Is the robust var-cov matrix still usable? Mostly. We can use the covariance matrix to test any subset of joint hypotheses that do not exceed its rank. Mark mentioned the link from the unreported F-statistic to an explanation of why the statistic is not reported. That link was created when we were considering the issue of fewer clusters than parameters using a clustered version of the robust variance estimator. The link does not discuss the issue that Mark and David uncovered. We had not even considered the question of singleton indicators, or as it turns out ANY data and model that lead to all 0 scores or by extension scores that are collinear. The two cases, too few clusters and singleton indicators, produce the same problem, a G matrix that is not full rank. We will update that link to be more complete, but unfortunately anyone who has read the nice clear discussion in the link, and also read the above will realize the discussion in the link is about to become more complicated. -- Vince vwiggins@stata.com P.S. A little (unrelated) story -------------------------- You might wonder why I always use the word "indicator" to describe binary regressors, even when the original poster called them "dummy" variables. I was once briefing a group of mainly military personal about the implications of some model. About the third time I referred to the "colonel dummy" everyone at the table broke out in laughter, everyone that is except the older gentleman at the head of the table with lots of bars on his label and little bird emblems on his shoulders. I long ago forgot the subject of the talk, but I never forgot the lesson. <end> * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: testing endogeneity in a two-equation model with censored andbinary dependent variables.** - Next by Date:
**st: Probit and fixed effects** - Previous by thread:
**st: accumulating pwcorr output** - Next by thread:
**st: Probit and fixed effects** - Index(es):

© Copyright 1996–2019 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |