[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Definition of strata and PSUs when svysetting

From   "Stas Kolenikov" <>
Subject   Re: st: Definition of strata and PSUs when svysetting
Date   Fri, 28 Mar 2008 18:54:13 -0500

On 3/28/08, Angel Rodriguez Laso <> wrote:
> Thank you for your answer, Stas.
>  Ive tried both specifications and the first surprise was that Stata 9
>  ignores further stages when stage 1 is sampled with replacement.

That's right, if you sample with replacement, then your PSUs are
independent (provided that you sample independently from those PSUs
that are selected more than once).

>  The problem with using age groups as second stage strata is that being 3 the
>  number of people over 65 selected per census tract, whenever there are
>  missing values in the variables some strata become single-PSU (person)
>  strata, what prevents Stata from calculating standard errors.

See below -- I have questions about it.

> This is something I want to check with
>  you: From the reading of Korn and Graubard "Analysis of health surveys" Ive
>  understood that in complex surveys degrees of freedom are calculated as
>  #PSUs - #strata (624 for the first specification and 1244 for the second,
>  because Stata duplicates the number of census tracts because each of them
>  belongs to two different strata).

Well I understood from your initial posting that you had 7 strata, and
from each you've taken 7 "young" people and 3 elderly. But upon
re-reading it, I see that you never mentioned the number of census
tracts you are sampling per stratum -- which would be your PSUs, and
individuals will be your SSUs. If you indeed have 600+ PSUs/tracts,
then you don't need to worry that much about degrees of freedom -- but
there might still be asymptotic issues, as the conventional
asymptotics are the number of strata going to infinity, with #PSUs per
stratum being bounded from above. That's a rather esoteric issue
though; I think Krewski and Rao (1981) was a well known one that made
the distinction (
Then also if you have 600+ PSUs, then I don't see how you could get
singleton strata -- you really would need to have all of your tracts
to miss people 65+.

>  Its usual practice
>  to work with such low numbers of individuals per PSU (10 in my case) and
>  Ive never heard that there was a problem of a small sample size then.

Yes. What matters most is the number of PSUs. I think what Korn and
Graubard don't like about d.f. = #PSU - #strata is that this is a very
low number for some important surveys or domains in those surveys,
like hispanics in NHANES where that number is something like 6, even
though there might be a few hundred cases. I think they had a
discussion in the book how to increase that number, although all their
strategies are ad hoc, and few are indeed justifiable from a rigorous
JNK Rao-style design perspective. They had another paper in JRSSa
( where they also
raise similar issues.

Steven Samuels asked some relevant questions, too.

Stas Kolenikov, also found at
Small print: Please do not reply to my Gmail address as I don't check
it regularly.

