Home  /  Resources & support  /  FAQs  /  Maximum likelihood estimation vce(cluster clustvar)

Are the estimates produced by probit and logit with the vce(cluster clustvar) option true maximum likelihood estimates?

Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar)
(and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster clustvar))?

Title   Maximum likelihood estimation with vce(cluster clustvar)
Author William Sribney, StataCorp

Answer to first question

No, they are not true maximum likelihood estimates.

Traditional maximum likelihood theory requires that the likelihood function be the distribution function for the sample.

When you have clustering, the observations are no longer independent; thus the joint distribution function for the sample is no longer the product of the distribution functions for each observation. That is, the joint distribution f(Y) is not

       n
       Õ fi(yi)
       i=1
Thus
       n
       S log fi(yi)
       i=1

is not the true log-likelihood for the sample.

Unless one fully parameterizes the correlation within clusters (as in, say, a random-effects probit), one cannot write down the true likelihood for the sample.

The robust estimator used by probit, vce(cluster clustvar), and svy: probit, does not assume any particular model for the within-cluster correlation. Instead, these commands merely assume the values of b that maximize

       n
       S log fi(b; yi)
      i=1

(call them bhat) are a reasonable estimate of the true b.

At this point in this discussion, the key question to ask is, What is the true b that is being estimated? It is the values of b that maximize

       N
       S log fi(b; yi)
       i=1

where now the sum is over all individuals (i = 1,...,N) in the population from which the sample was drawn. That is, the true b is the solution of the maximum likelihood equation that we would have if we had data on all individuals in the population.

We are justified in using bhat as an estimate for the true b if

       n
       S log fi(b; yi)
       i=1

is a good estimate for

       N
       S log fi(b; yi)
       i=1
which is a reasonable assumption, even if we have clustering.

Sampling weights

If we have sampling weights, wi, then we get bhat as the solution to

       n
       S wi * log fi(b; yi)
       i=1

since it is reasonable to assume

       n
       S wi * log fi(b; yi)
      i=1                       

is a good estimate for

       N
       S log fi(b; yi)
      i=1

Since the likelihood used to derive bhat in the case of clustering or sampling weights is not a true likelihood, it is called a pseudolikelihood.

Variance estimates

The variance estimates are now computed using sampling theory. That is, we say, what if the sample was drawn again and again using the same scheme (i.e., clustered or weighted), and bhat was mechanically computed as the maximum of the pseudolikelihood, what would the variance of bhat be?

Since traditional likelihood theory cannot be invoked for clustering or weighted sampling, one should not use traditional likelihood-ratio tests in these cases.

Answer to second question

Is there a difference between the estimates produced by the svy: probit command and probit, vce(cluster clustvar) (and, similarly, between svy: logit, with psu variable specified in svyset and logit, vce(cluster clustvar))?

The point estimates and variance estimates are always the same.

The commands differ only in some small details. svy: probit and svy: logit use t statistics, whereas probit, vce(cluster clustvar) and logit, vce(cluster clustvar) use z statistics. The degrees of freedom for the t in svy: probit and svy: logit are the number of clusters (PSUs) minus the number of strata (one if unstratified). Strictly speaking, svy: probit and svy: logit are doing things right, but the difference matters only if you have a small number of clusters (say <40).

svy: probit and svy: logit also use an adjusted Wald test for the model test. probit, vce(cluster clustvar) and logit, vce(cluster clustvar) use an ordinary Wald test. Again, this difference matters only if you have a small number of clusters.

References

For a description of the variance estimator, see [SVY] variance estimation and [P] _robust in the Stata reference manuals.

Two standard references for this variance estimator as applied to pseudolikelihoods are

Binder, D. A. 1983.
On the variances of asymptotically normal estimators from complex surveys. International Statistical Review 51: 279–292.
Skinner, C. J. 1989.
Introduction to Part A. In Analysis of Complex Surveys, ed. C. J. Skinner, D. Holt, and T. M. F. Smith, 23–58. New York: Wiley.
Wolter, K. M. 2007.
Introduction to Variance Estimation. 2nd ed. New York: Springer.