Are the estimates produced by probit and logit with the vce(cluster
clustvar) option true maximum likelihood estimates?
Is there a difference between the estimates produced by the
svy: probit, with psu variable specified in svyset command and
probit, vce(cluster clustvar) (and, similarly, between
svy: logit, with psu variable specified in svyset, and logit
vce(cluster clustvar))?
|
Title
|
|
Maximum likelihood estimation
|
|
Author
|
Bill Sribney, StataCorp
|
|
Date
|
December 1997; updated April 2005; minor revisions July 2011
|
Answer to first question
No, they are not true maximum likelihood estimates.
Traditional maximum likelihood theory requires that the likelihood function
be the distribution function for the sample.
When you have clustering, the observations are no longer independent; thus
the joint distribution function for the sample is no longer the product of
the distribution functions for each observation. That is, the joint
distribution f(Y) is not
n
Õ fi(yi)
i=1
Thus
n
S log fi(yi)
i=1
is not the true likelihood for the sample.
Unless one fully parameterizes the correlation within clusters (as in, say,
a random-effects probit), one cannot write down the true likelihood for the
sample.
The robust estimator used by
probit, vce(cluster clustvar), and
svy: probit,
does not assume any particular model for the within-cluster
correlation. Instead, these commands merely assume the values of
b that maximize
n
S log fi(b; yi)
i=1
(call them bhat) are a reasonable estimate of the true b.
At this point in this discussion, the key question to ask is, What is the
true b that is being estimated? It is the
values of b that maximize
N
S log fi(b; yi)
i=1
where now the sum is over all individuals (i = 1,...,N) in the
population from which the sample was drawn. That is, the true
b is the solution of the maximum likelihood
equation that we would have if we had data on all individuals in the
population.
We are justified in using bhat as an
estimate for the true b if
n
S log fi(b; yi)
i=1
is a good estimate for
N
S log fi(b; yi)
i=1
which is a reasonable assumption, even if we have clustering.
Sampling weights
If we have sampling weights, wi, then we get
bhat as the solution to
n
S wi * log fi(b; yi)
i=1
since it is reasonable to assume
n
S wi * log fi(b; yi)
i=1
is a good estimate for
N
S log fi(b; yi)
i=1
Since the likelihood used to derive bhat in
the case of clustering or sampling weights is not a true likelihood, it is
called a pseudolikelihood.
Variance estimates
The variance estimates are now computed using sampling theory. That is, we
say, what if the sample was drawn again and again using the same scheme
(i.e., clustered or weighted), and bhat was
mechanically computed as the maximum of the pseudolikelihood, what would the
variance of bhat be?
Since traditional likelihood theory cannot be invoked for clustering or
weighted sampling, one should not use traditional likelihood-ratio tests in
these cases.
Answer to second question
Is there a difference between the estimates produced by the svy: probit
command and probit, vce(cluster clustvar)
(and, similarly, between svy: logit, with psu variable specified in
svyset and logit, vce(cluster clustvar))?
The point estimates and variance estimates are always the same.
The commands differ only in some small details.
svy: probit and
svy: logit
use t statistics, whereas
probit, vce(cluster
clustvar) and
logit, vce(cluster
clustvar) use z statistics. The degrees of
freedom for the t in
svy: probit and
svy: logit are the number of clusters (PSUs) minus the number of
strata (one if unstratified). Strictly speaking, svy: probit
and svy: logit are doing things right, but the difference
matters only if you have a small number of clusters (say <40).
svy: probit and
svy: logit also use an adjusted Wald test for the model test.
probit, vce(cluster clustvar) and
logit, vce(cluster clustvar) use an ordinary Wald test. Again,
this difference matters only if you have a small number of clusters.
References
For a description of the variance estimator, see [SVY]
variance estimation and [P]
_robust
in the Stata reference manuals.
Two standard references for this variance estimator as applied to
pseudolikelihoods are
- Binder, D. A. 1983.
- On the variances of asymptotically normal estimators from complex
surveys. International Statistical Review 51: 279–292.
- Skinner, C. J. 1989.
- Introduction to Part A.
In Analysis of Complex Surveys,
ed. C. J. Skinner, D. Holt, and T. M. F. Smith, 23–58.
New York: Wiley.
- Wolter, K. M. 2007.
- Introduction to Variance Estimation. 2nd ed. New York: Springer.
|