Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: Using STATA to analyze complex national datasets

From   "Sayer, Bryan" <>
To   "''" <>
Subject   st: RE: Re: Using STATA to analyze complex national datasets
Date   Wed, 5 Feb 2003 12:37:57 -0500

It is important to note that typically when a sampling statistician talks
about variance estimates the discussion is usually about the estimate from
ONE specific sample with a single specific design.  However, in order to
approximate estimates for a COMPLETE population, data sets may contain data
from several different samples with different designs.  For example, part
might be a telephone sample, a second part might be a sample designed to get
unlisted phone respondents, while a third part might be a field sample of
people without phone service or with only intermittent service.  Each of
these portions may have a different sample design (and vastly different
variances), but the information about each of the samples needs to be
incorporated in a general fashion (or else estimates have to be generated
separately and combined, a nasty process) in order to allow full population
estimates.  And if one is trying to make small area estimates, then all this
information has to go together somehow.  And of course the PSUs may be
sampled with different probabilities, a factor that is not incorporated in

Concurrent with this issue is the problem that users rarely have sufficient
detail about the actual VALUES of the sampling variables to be able to know
whether it is reasonable to use a single stage design or not.  In fact, they
usually don't even have enough information to reasonably collapse strata
together when faced with certainty PSUs.  There are a multitude of reasons
why this is the case, with confidentiality clearly at the top (sampling
variables often contain a great deal of geographic information), but I
suspect laziness is also in there.

And Grahm is pretty clear on this:
" matter what form of subsampling is used WITHIN the PSUs"

So for each PSU there isn't a problem.  But how do you combine the
information from each PSU?

Bryan Sayer
Statistician, SSS Inc.

-----Original Message-----
From: Tim Hofer []
Sent: Wednesday, February 05, 2003 10:08 AM
Subject: st: Re: Using STATA to analyze complex national datasets

In the  Sage book on sampling by Grahm Kalton he states:
"Under the with-replacement assumption a single standard error formula for a
particular estimator applies, no matter what form of subsampling is used
within the PSUs.  Thus for instance, the same formula applies whether the
elements are sampled (1) by SRS within the selected PSUs, (2) by systematic
or stratified sampling, or (3) with further sampling stages and
stratification.  This generality is appealing...buecause the user of the
program is not required to supply the program with details about the
subsample design.  The use of these programs requires only that each survey
data record contains a code to indicate to which PSU it belongs, together
with information about the first-stage stratification."(p78)

Do we disagree?
Doesn't this cover many of the common designs for large national surveys?
to paraphrase Caleb Southworth's question - what are some examples of
when a multi-stage design can not be reduced to one stage using this

----- Original Message -----
From: "Caleb Southworth" <>
To: <>
Sent: Tuesday, February 04, 2003 10:35 PM
Subject: st: RE: Using STATA to analyze complex national datasets

> On Mon, 3 Feb 2003, Sayer, Bryan wrote:
> :documentation for them.  Basically, the answer is no.  Stata does only
> :level of sample design.  So unless you can reduce a more complex sample
> :design down to one level, it is not possible in Stata.  One issue in
> :simplifying the sample design is that you can get increased variability
> :the variance.  So it isn't as simple as just using the highest level.
> :Perhaps if enough people lean on NCHS, they might come up with something.
> I think Bryan does an excellent job of raising the question: When can a
> two-stage design be reduced to one-level? A cursory search of the web
> shows lots of users collapsing two-stage designs into clusters and strata,
> i.e.
> My point here is not to single out a particular course webpage, but rather
> to highlight what appears to be a gernal problem.
> I don't know the NCHS data to which Joe refers, but I see this sort of
> problem all the time in the Russian Longitudinal Monitoring Survey (RLMS):
> analysts either ignore one level of clustering or treat a cluster as a
> strata. RLMS has a two-stage design in which it first selects geographic
> regions and then selects households. All adult members of the household
> are interviewed. So the question is: what is the implication of analyzing
> data from a two-stage cluster sample as cluster and strata? Or in STATA
> svyset strata region
> svyset psu household
> svyset pweight indwgt
> Another way to ask this question might be: Do strata have to be nested
> within clusters? Regions and households are both clusters, i.e. they are
> both "sampling unit[s] with which one or more listing units can be
> associated" (Levy and Lemeshow 1999, p. 266). Likewise, region would also
> seem to be a stratum, as in one of L mutually exclusive and exhaustive
> groups from which a simple random sample is drawn (Ibid., p. 121).
> Is this a reasonable way to collapse a two-stage design into one level for
> analysis with STATA's survey estimators?
> If the nested nature of the design is crucial, perhaps that could be
> addressed with HLM where we have two levels and clustering by households?
> gllamm [individual level variable] , i(region) cluster(household)
> This has the advantage of being able to specify weights at both levels and
> have a list of variables that define clusters. Comments?
> Dr. Caleb Southworth, Ph.D
> American Council of Learned Societies Research Fellow 2002-03
> Assistant Professor
> Department of Sociology
> 1291 University of Oregon
> Eugene OR 97403
> Work: (541) 346-5034
> Fax:  (541) 346-5026
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index