[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Definition of strata and PSUs when svysetting

From   "Angel Rodriguez Laso" <>
To   <>
Subject   RE: st: Definition of strata and PSUs when svysetting
Date   Mon, 31 Mar 2008 12:34:46 +0200

Dear Stas,

I have singleton strata in the second stage, where in each census tract
strata are defined as under/over65 and, because of the sampling design, in
some census tracts just one individual over 65 was interviewed.

Best regards,

Ángel Rodríguez Laso

-----Mensaje original-----
[] En nombre de Stas Kolenikov
Enviado el: sábado, 29 de marzo de 2008 0:54
Asunto: Re: st: Definition of strata and PSUs when svysetting

On 3/28/08, Angel Rodriguez Laso <> wrote:
> Thank you for your answer, Stas.
>  I´ve tried both specifications and the first surprise was that Stata 9
>  ignores further stages when stage 1 is sampled with replacement.

That's right, if you sample with replacement, then your PSUs are
independent (provided that you sample independently from those PSUs
that are selected more than once).

>  The problem with using age groups as second stage strata is that being 3
>  number of people over 65 selected per census tract, whenever there are
>  missing values in the variables some strata become single-PSU (person)
>  strata, what prevents Stata from calculating standard errors.

See below -- I have questions about it.

> This is something I want to check with
>  you: From the reading of Korn and Graubard "Analysis of health surveys"
>  understood that in complex surveys degrees of freedom are calculated as
>  #PSUs - #strata (624 for the first specification and 1244 for the second,
>  because Stata duplicates the number of census tracts because each of them
>  belongs to two different strata).

Well I understood from your initial posting that you had 7 strata, and
from each you've taken 7 "young" people and 3 elderly. But upon
re-reading it, I see that you never mentioned the number of census
tracts you are sampling per stratum -- which would be your PSUs, and
individuals will be your SSUs. If you indeed have 600+ PSUs/tracts,
then you don't need to worry that much about degrees of freedom -- but
there might still be asymptotic issues, as the conventional
asymptotics are the number of strata going to infinity, with #PSUs per
stratum being bounded from above. That's a rather esoteric issue
though; I think Krewski and Rao (1981) was a well known one that made
the distinction (
Then also if you have 600+ PSUs, then I don't see how you could get
singleton strata -- you really would need to have all of your tracts
to miss people 65+.

>  It´s usual practice
>  to work with such low numbers of individuals per PSU (10 in my case) and
>  I´ve never heard that there was a problem of a small sample size then.

Yes. What matters most is the number of PSUs. I think what Korn and
Graubard don't like about d.f. = #PSU - #strata is that this is a very
low number for some important surveys or domains in those surveys,
like hispanics in NHANES where that number is something like 6, even
though there might be a few hundred cases. I think they had a
discussion in the book how to increase that number, although all their
strategies are ad hoc, and few are indeed justifiable from a rigorous
JNK Rao-style design perspective. They had another paper in JRSSa
( where they also
raise similar issues.

Steven Samuels asked some relevant questions, too.

Stas Kolenikov, also found at
Small print: Please do not reply to my Gmail address as I don't check
it regularly.

*   For searches and help try:

Mensaje analizado y protegido por Telefonica Empresas

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index