FAQ: Maximum likelihood estimation

Home / Resources & support / FAQs / Maximum likelihood estimation vce(cluster clustvar)

Are the estimates produced by probit and logit with the vce(cluster clustvar) option true maximum likelihood estimates?

Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar)
(and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster clustvar))?

Title		Maximum likelihood estimation with vce(cluster clustvar)
Author		William Sribney, StataCorp

Answer to first question

No, they are not true maximum likelihood estimates.

Traditional maximum likelihood theory requires that the likelihood function be the distribution function for the sample.

When you have clustering, the observations are no longer independent; thus the joint distribution function for the sample is no longer the product of the distribution functions for each observation. That is, the joint distribution $ f(Y) $ is not

$$\prod_{i=1}^n f_i(y_i)$$

Thus

$$\sum_{i=1}^n \; log \; f_i(y_i)$$

is not the true log-likelihood for the sample.

Unless one fully parameterizes the correlation within clusters (as in, say, a random-effects probit), one cannot write down the true likelihood for the sample.

The robust estimator used by probit, vce(cluster clustvar), and svy: probit, does not assume any particular model for the within-cluster correlation. Instead, these commands merely assume the values of $ b $ that maximize

$$\sum_{i=1}^n \; log \; f_i(b; \; y_i)$$

(call them bhat) are a reasonable estimate of the true $ b $.

At this point in this discussion, the key question to ask is, what is the true $ b $ that is being estimated? It is the values of $ b $ that maximize

$$\sum_{i=1}^N \; log \; f_i(b; \; y_i)$$

where now the sum is over all individuals ($ i = 1,...,N $) in the population from which the sample was drawn. That is, the true $ b $ is the solution of the maximum likelihood equation that we would have if we had data on all individuals in the population.

We are justified in using bhat as an estimate for the true $ b $ if

$$\sum_{i=1}^n \; log \; f_i(b; \; y_i)$$

is a good estimate for

$$\sum_{i=1}^N \; log \; f_i(b; \; y_i)$$

which is a reasonable assumption, even if we have clustering.

Sampling weights

If we have sampling weights, w_i, then we get bhat as the solution that maximizes

$$\sum_{i=1}^n w_i * \; log \; f_i(b; \; y_i)$$

since it is reasonable to assume

$$\sum_{i=1}^n w_i * \; log \; f_i(b; \; y_i)$$

is a good estimate for

$$\sum_{i=1}^N \; log \; f_i(b; \; y_i)$$

Since the likelihood used to derive bhat in the case of clustering or sampling weights is not a true likelihood, it is called a pseudolikelihood.

Variance estimates

The variance estimates are now computed using sampling theory. That is, we say, what if the sample was drawn again and again using the same scheme (for example, clustered or weighted), and bhat was mechanically computed as the maximizer of the pseudolikelihood, what would the variance of bhat be?

Since traditional likelihood theory cannot be invoked for clustering or weighted sampling, one should not use traditional likelihood-ratio tests in these cases.

Answer to second question

Is there a difference between the estimates produced by the svy: probit command and probit, vce(cluster clustvar) (and, similarly, between svy: logit, with psu variable specified in svyset and logit, vce(cluster clustvar))?

The point estimates and variance estimates are always the same.

The commands differ only in some small details. svy: probit and svy: logit use t statistics, whereas probit, vce(cluster clustvar) and logit, vce(cluster clustvar) use z statistics. The degrees of freedom for the t in svy: probit and svy: logit are the number of clusters (PSUs) minus the number of strata (one if unstratified). Strictly speaking, svy: probit and svy: logit are doing things right, but the difference matters only if you have a small number of clusters (say <40).

svy: probit and svy: logit also use an adjusted Wald test for the model test. probit, vce(cluster clustvar) and logit, vce(cluster clustvar) use an ordinary Wald test. Again, this difference matters only if you have a small number of clusters.

References

For a description of the variance estimator, see [SVY] variance estimation and [P] _robust in the Stata reference manuals.

Two standard references for this variance estimator as applied to pseudolikelihoods are

Binder, D. A. 1983.: On the variances of asymptotically normal estimators from complex surveys. International Statistical Review 51: 279–292.

Skinner, C. J. 1989.: Introduction to Part A. In Analysis of Complex Surveys, ed. C. J. Skinner, D. Holt, and T. M. F. Smith, 23–58. New York: Wiley.

Wolter, K. M. 2007.: Introduction to Variance Estimation. 2nd ed. New York: Springer.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Are the estimates produced by probit and logit with the vce(cluster clustvar) option true maximum likelihood estimates?

Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar)
(and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster clustvar))?

Answer to first question

Sampling weights

Variance estimates

Answer to second question

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Are the estimates produced by probit and logit with the vce(cluster clustvar) option true maximum likelihood estimates?

Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar) (and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster clustvar))?

Answer to first question

Sampling weights

Variance estimates

Answer to second question

References

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Is there a difference between the estimates produced by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar)
(and, similarly, between svy: logit, with psu variable specified in svyset, and logit vce(cluster clustvar))?