# st: RE: Using STATA to analyze complex national datasets

 From Caleb Southworth To "'statalist@hsphsun2.harvard.edu'" Subject st: RE: Using STATA to analyze complex national datasets Date Tue, 4 Feb 2003 19:35:54 -0800 (PST)

```On Mon, 3 Feb 2003, Sayer, Bryan wrote:

:documentation for them.  Basically, the answer is no.  Stata does only one
:level of sample design.  So unless you can reduce a more complex sample
:design down to one level, it is not possible in Stata.  One issue in
:simplifying the sample design is that you can get increased variability in
:the variance.  So it isn't as simple as just using the highest level.
:Perhaps if enough people lean on NCHS, they might come up with something.

I think Bryan does an excellent job of raising the question: When can a
two-stage design be reduced to one-level? A cursory search of the web
shows lots of users collapsing two-stage designs into clusters and strata,
i.e.
http://www.williams.edu/Mathematics/courses/Math443/stataview/stata.part1/node2.html
My point here is not to single out a particular course webpage, but rather
to highlight what appears to be a gernal problem.

I don't know the NCHS data to which Joe refers, but I see this sort of
problem all the time in the Russian Longitudinal Monitoring Survey (RLMS):
analysts either ignore one level of clustering or treat a cluster as a
strata. RLMS has a two-stage design in which it first selects geographic
regions and then selects households. All adult members of the household
are interviewed. So the question is: what is the implication of analyzing
data from a two-stage cluster sample as cluster and strata? Or in STATA

svyset strata region
svyset psu household
svyset pweight indwgt

Another way to ask this question might be: Do strata have to be nested
within clusters? Regions and households are both clusters, i.e. they are
both "sampling unit[s] with which one or more listing units can be
associated" (Levy and Lemeshow 1999, p. 266). Likewise, region would also
seem to be a stratum, as in one of L mutually exclusive and exhaustive
groups from which a simple random sample is drawn (Ibid., p. 121).

Is this a reasonable way to collapse a two-stage design into one level for
analysis with STATA's survey estimators?

If the nested nature of the design is crucial, perhaps that could be
addressed with HLM where we have two levels and clustering by households?

gllamm [individual level variable] , i(region) cluster(household)

This has the advantage of being able to specify weights at both levels and
have a list of variables that define clusters. Comments?

Dr. Caleb Southworth, Ph.D
American Council of Learned Societies Research Fellow 2002-03
Assistant Professor
Department of Sociology
1291 University of Oregon
Eugene OR 97403
Work: (541) 346-5034
Fax:  (541) 346-5026

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```