Re: st: xtlogit and svy command

 From "Stas Kolenikov" To statalist@hsphsun2.harvard.edu Subject Re: st: xtlogit and svy command Date Sun, 25 Feb 2007 14:35:18 -0600

```Margaret,

my understanding is that by unobserved heterogeneity econometric slang
you just mean non-zero variance of the random effects. (The list FAQ
strongly suggests using the terms that most people on the list would
understand, not only the specialists in your narrow field.)

-svy- poorly interacts with -xt-, and it is for a reason. -svy-
commands work with what is called design-based inference paradigm:
there is a finite population of interest from which a random sample is
clustering, and differential probabilities of selection. When you go
collecting the data again in the longitudinal manner, the question
would arise, what is the population to which the results can be
generalized? Is it the original population of say 1990, the first year
in which the data were collected? Is it the population that has been
present during the period 1990-2000? Is it the population in the end
of the period? Is it each of the populations between 1990 and 2000?
Technically, if your sample was taken in 1990, then it is the only
population to which the results would generalize, and the (original
1990) weights would then work to provide unbiased (or rather
consistent, for nonlinear models) estimates of the corresponding
population quantities.

Now, the mechanics of the linearized vce is that it needs the
likelihood scores for each unit (i.e., panel), and it needs the
weights to be constant within that panel. In fact, the whole
pseudo-likelihood procedure is based on computing the integrals over
the random effects for the -re- models, or conditioning on the number
of successes in the -fe- models, and what enters the likelihood is a
certain function of all observations within the panel. That's what the
weight can be attached to, and from that it is clear that the weights
cannot vary within panels, and at least that is tricky, then.

Further about the mechanics: the linearized vce estimator needs the
scores to be computable from the likelihood or pseudo-likelihood (and
feeding them to -_robust- calculator of the sandwich variances... if
you understand the mechanics of Eicker-White heteroskedasticity
consistent estimator, then you know more than half of that mechanics.)
-xtlogit- does not -predict- the scores; must be for a reason, too --
it would again be at the panel level, and it would look like an
integral over the random effects. Frankly, I don't know whether those
scores are at all computable for this particular model.

Well, now about the conceptual approach: I said that the -svy-
commands assume the fixed characteristics, yet -xt*, re- deals with
something randomly distributed in the population, rather than fixed.
That is a conceptual contradiction to which I don't have a good
answer. Whoever manages to push this through might be able to open a
whole new area in survey statistics. (Yes, I am aware of model-based
and model-assisted estimation, but it is a different story.)

So this standard route with -svy: xtlogit- seems to be closed. Is that
the end of the world? Well... there's a couple of alternatives.

First, you can reformulate your model to be estimated by -gllamm- that
allows for multilevel weights. Mastering -gllamm- is a formidable
task, but pays off very well in the end. I won't even start to give
indications as to how to proceed with it -- you would have to take a
couple of weeks to figure it, as it is a whole estimation paradigm in
itself, and you really need to understand it deeply to make it work.

Second, you can specify -svy, jackknife : xtlogit - instead, and see
what comes out. If you have 5000 panels, the -svy jackknife- would
need to re-estimate the model 5000 times, so be prepared to have your
computing unit chew on it for a couple of days. Even at that, I would
tend to think that the variance of the random effect would be treated
as an ancillary parameter, and so you might have to take some special
action to make the jackknife report that variance, to put a confidence
interval on it.

Here's yet another subtlety, if not a caveat: I tend to think that
both -xtlogit- and -gllamm- would work by freely estimating the
variance of random effects, and effectively fixing the variance of the
individual random effects to _pi^2/6. It might be argued that instead
the total variance of u_i+e_{ij} should be fixed, to make the results
comparable with those of -logit-. This would be especially relevant in
the resampling estimators like -svy jackknife-, as it is not quite
clear whether the results of those 5000 re-estimates are actually
comparable to one another. In jackknife, you are dropping panels one
by one; of course this would be changing the estimates of the
variance, but if you fix that variance, it would instead affect the
point estimates, as only the ratio of the logit coefficients to the
total error standard deviation is identified. So I would not be
terribly convinced by those results, either.

Ohio State has the Center for Survey Research
(http://www.csr.ohio-state.edu/), so I imagine you can talk to
somebody there. They seem to be more like the consulting center to me
though, so I don't know if they are able to provide strong advice on a
complex model such as -xtlogit-

Hope this helps.

On 2/24/07, Margaret Gassanov <gassanov.1@osu.edu> wrote:
```
```Hello,

I am trying to run a discrete-time model (random-effects), correcting for
unobserved heterogeneity, and using weights for complex survey data.

Unfortunately, I am having a problem getting this model to run.

This is my syntax:
svyset [pweight=weight], strata(region) psu(psu)
svy, subpop(sampled): xtlogit DV [list of IVs], i(id)

I get this response: xtlogit is not supported by svy with vce(linearized);
see help survey for a list of Stata estimation commands that are supported
by svy

If I run the model without the survey command, I see that I do have
unobserved heterogeneity.  I can also run the model with the weights but
not with the checks for unobserved heterogeneity.  But I cannot seem to do
both.

Is there another method I could do to get around this?  Or is there
another way to do event-history analysis that uses both survey weights and
corrects for unobserved heterogeneity?  I would be very happy to hear any
suggestions.

```
```
--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```