[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: lincom & svy commands

From   "Stas Kolenikov" <[email protected]>
To   [email protected]
Subject   Re: st: lincom & svy commands
Date   Thu, 23 Oct 2008 11:19:05 -0500

You endow -lincom- with intellectual capacities it does not really
have :)). it won't be thinking for you about the design you have. In
fact, it won't be thinkging about anything at all. Its job is to take
whatever other estimation commands have left in memory ( e(b) vector
and e(V) matrix) and do some linear manipulations with them. Whatever
you have specified as the design in -svyset-, and maybe groups in
-means, over()- is what goes into -lincom-, and that's what matters.

Suppose you have put your data in a single file where the cluster IDs
are the same between the waves for the PSUs that were retained, and
some individual IDs are the same, too, but PSUs/individuals sampled
anew have distinct values of identifiers. Then you would probably want
to specify

svyset cluster [pw if present], strata(if present) || individual

then -svy : means, over(year)- (not the -lincom- itself!) will
estimate the means over years, and for each year, clustering will be
taken into account. When you compute the differences via -lincom-, you
will get the standard errors that account for that clustering, but
probably won't account for the fact that some individuals were the

To get that latter thing corrected for, you might want to try something like

xi: svy: reg response i.year

so that observations for the same PSU and the same individual enter
into estimation simultaneously. I'd be surprised if the standard
errors were drastically different, but I would probably expect to see
a small difference in the second meaningful decimal place -- entirely
due to the differences in how individual overlaps are accounted for,
which would probably be but a small portion of the total variance.

On the other hand, if your data are organized in such a way that the
PSU IDs are different in different years, then there is no way for
Stata (meaning, -svy: mean- command, let alone -lincom-) to figure out
which clusters are the same, and it will estimate everything
correcting for clustering within any given wave, but any longitudinal
aspect, either at the PSU or SSU level, will be totally ignored. And
that may not be much of a disaster, either, unless whatever you are
measuring is highly persistent over time between your waves.

I worked on somewhat related issues a few years back -- there is a
working paper on Carolina Population Center website
( where I give some examples.

On 10/23/08, Bell, Jacqueline S. <[email protected]> wrote:
> Hi
>  This is a follow-on to a previous message I sent this month asking about how lincom calculates standard errors when clustering is present.
>  Can anyone advise me on what lincom actually does when estimating differences in parameters from svy:mean or svy:prop?
>  I have before/after data which is not paired at an individual level, but has a cluster structure.  In these circumstances it is not obvious how lincom goes about estimating the before/after difference.
>  The two alternatives suggested to me are:
>  i) it estimates before and after separately for the whole population, then estimates the difference
>  ii)it estimates the difference in each cluster, and then the overall difference.
>  In the data there are (in most cases) before and after data for each cluster, but often quite severe imbalances in samples before/after within cluster.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index