Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy + aweights


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: svy + aweights
Date   Thu, 10 Nov 2011 16:59:22 -0500

The nature of the -cluster()- variance estimators is such that they
control for any correlation pattern that might be observed within a
PSU. This is a non-parametric estimator, and you are probably thinking
along the lines of something like GEE.

Suppose you have a model

y = grand mean + {m==cluser mean} + {u==individual mean} +
{e==observation measurement error}

with n subjects and k observations per subject. Assume that u and e
are homoskedastic. If you have k=1 observation per subject, then you
cannot distinguish u and e, and have essentially one error term. The
covariance matrix is then Var[u+e] times an exchangeable correlation
structure with corr = Var[m]/(Var[m]+Var[u]+Var[e]). If you have
multiple observations, k>1, per subject, your covariance matrix is
J(kn,kn,Var[m]) + I(n) # J(k,k,Var[u]) + I(nk)*Var[e], which is a more
complicated pattern. In GEE, you have to put these structures into the
objective function as working correlation structures to get your
estimates. With -cluster()-, you don't have to, but you should expect
your estimates to be less efficient compared to a situation when the
above model were true, and you ran a (feasible) GLS estimation. As
long as you have # of clusters -> infinity, you can build a consistent
estimator of (within-cluster) Var[y], which will be accounted for in
-svy- commands.

Hope this helps.

On Thu, Nov 10, 2011 at 4:48 PM, Jeph Herrin <stata@spandrel.net> wrote:
> I'm not sure I get this. How can correlations at one level be "engulfed"
> by correlations at another? The PSUs account for subject level correlation,
> but for each subject I have multiple observations.
>
> Moreover, if I -reshape-, does it still make sense to -svyset psu-? I
> thought
> not.
>
> On 11/10/2011 4:40 PM, Stas Kolenikov wrote:
>>
>> On Thu, Nov 10, 2011 at 4:01 PM, JH<junk@spandrel.net>  wrote:
>>>
>>> But doesn't your suggestion ignore the correlation of observations within
>>> subjects?
>>
>> No. Unless your current -svyset-ting is -svyset _n-... and frankly I
>> don't know how that would behave with -reshape-. If you have PSUs in
>> your -svyset- (and NHANES does have them), then the correlations of
>> observations within the subjects will be engulfed by the correlations
>> of observations within the PSUs that -svy:- controls for.
>>
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index