Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster

(and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster

Title | Maximum likelihood estimation | |

Author | Bill Sribney, StataCorp | |

Date | December 1997; updated June 2013 |

No, they are not true maximum likelihood estimates.

Traditional maximum likelihood theory requires that the likelihood function be the distribution function for the sample.

When you have clustering, the observations are no longer independent; thus
the joint distribution function for the sample is no longer the product of
the distribution functions for each observation. That is, the joint
distribution f(Y) is *not*

Thusn Õ f_{i}(y_{i}) i=1

n S log f_{i}(y_{i}) i=1

is *not* the true log-likelihood for the sample.

Unless one fully parameterizes the correlation within clusters (as in, say, a random-effects probit), one cannot write down the true likelihood for the sample.

The robust estimator used by
probit, vce(cluster clustvar), and
svy: probit,
does *not* assume any particular model for the within-cluster
correlation. Instead, these commands merely assume the values of
**b** that maximize

n S log f_{i}(b; y_{i}) i=1

(call them **bhat**) are a reasonable estimate of the true **b**.

At this point in this discussion, the key question to ask is, What is the
true **b** that is being estimated? It is the
values of **b** that maximize

N S log f_{i}(b; y_{i}) i=1

where now the sum is over *all* individuals (i = 1,...,*N*) in the
population from which the sample was drawn. That is, the true
**b** is the solution of the maximum likelihood
equation that we would have if we had data on all individuals in the
population.

We are justified in using **bhat** as an
estimate for the true **b** if

n S log f_{i}(b; y_{i}) i=1

is a good estimate for

which is a reasonable assumption, even if we have clustering.N S log f_{i}(b; y_{i}) i=1

If we have sampling weights, **w _{i}**, then we get

n S w_{i}* log f_{i}(b; y_{i}) i=1

since it is reasonable to assume

n S w_{i}* log f_{i}(b; y_{i}) i=1

is a good estimate for

N S log f_{i}(b; y_{i}) i=1

Since the likelihood used to derive **bhat** in
the case of clustering or sampling weights is not a true likelihood, it is
called a pseudolikelihood.

The variance estimates are now computed using sampling theory. That is, we
say, what if the sample was drawn again and again using the same scheme
(i.e., clustered or weighted), and **bhat** was
mechanically computed as the maximum of the pseudolikelihood, what would the
variance of **bhat** be?

Since traditional likelihood theory cannot be invoked for clustering or weighted sampling, one should not use traditional likelihood-ratio tests in these cases.

Is there a difference between the estimates produced by the svy: probit
command and probit, vce(cluster *clustvar*)
(and, similarly, between svy: logit, with psu variable specified in
svyset and logit, vce(cluster *clustvar*))?

The point estimates and variance estimates are always the same.

The commands differ only in some small details.
svy: probit and
svy: logit
use *t* statistics, whereas
probit, vce(cluster
clustvar) and
logit, vce(cluster
clustvar) use z statistics. The degrees of
freedom for the *t* in
**svy: probit** and
**svy: logit** are the number of clusters (PSUs) minus the number of
strata (one if unstratified). Strictly speaking, **svy: probit**
and **svy: logit** are doing things right, but the difference
matters only if you have a small number of clusters (say <40).

**svy: probit** and
**svy: logit** also use an adjusted Wald test for the model test.
**probit, vce(cluster** *clustvar***)** and
**logit, vce(cluster** *clustvar***)** use an ordinary Wald test. Again,
this difference matters only if you have a small number of clusters.

For a description of the variance estimator, see
[SVY] **variance estimation** and
[P] **_robust**
in the Stata reference manuals.

Two standard references for this variance estimator as applied to pseudolikelihoods are

- Binder, D. A. 1983.
- On the variances of asymptotically normal estimators from complex
surveys.
*International Statistical Review*51: 279–292.

- Skinner, C. J. 1989.
- Introduction to Part A.
In
*Analysis of Complex Surveys*, ed. C. J. Skinner, D. Holt, and T. M. F. Smith, 23–58. New York: Wiley.

- Wolter, K. M. 2007.
- Introduction to Variance Estimation. 2nd ed. New York: Springer.