[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Sayer, Bryan" <BSayer@s-3.com> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Re: Using STATA to analyze complex national datasets |

Date |
Wed, 5 Feb 2003 12:37:57 -0500 |

It is important to note that typically when a sampling statistician talks about variance estimates the discussion is usually about the estimate from ONE specific sample with a single specific design. However, in order to approximate estimates for a COMPLETE population, data sets may contain data from several different samples with different designs. For example, part might be a telephone sample, a second part might be a sample designed to get unlisted phone respondents, while a third part might be a field sample of people without phone service or with only intermittent service. Each of these portions may have a different sample design (and vastly different variances), but the information about each of the samples needs to be incorporated in a general fashion (or else estimates have to be generated separately and combined, a nasty process) in order to allow full population estimates. And if one is trying to make small area estimates, then all this information has to go together somehow. And of course the PSUs may be sampled with different probabilities, a factor that is not incorporated in Stata. Concurrent with this issue is the problem that users rarely have sufficient detail about the actual VALUES of the sampling variables to be able to know whether it is reasonable to use a single stage design or not. In fact, they usually don't even have enough information to reasonably collapse strata together when faced with certainty PSUs. There are a multitude of reasons why this is the case, with confidentiality clearly at the top (sampling variables often contain a great deal of geographic information), but I suspect laziness is also in there. And Grahm is pretty clear on this: "...no matter what form of subsampling is used WITHIN the PSUs" So for each PSU there isn't a problem. But how do you combine the information from each PSU? Bryan Sayer Statistician, SSS Inc. bsayer@s-3.com -----Original Message----- From: Tim Hofer [mailto:thofer@umich.edu] Sent: Wednesday, February 05, 2003 10:08 AM To: statalist@hsphsun2.harvard.edu Subject: st: Re: Using STATA to analyze complex national datasets In the Sage book on sampling by Grahm Kalton he states: "Under the with-replacement assumption a single standard error formula for a particular estimator applies, no matter what form of subsampling is used within the PSUs. Thus for instance, the same formula applies whether the elements are sampled (1) by SRS within the selected PSUs, (2) by systematic or stratified sampling, or (3) with further sampling stages and stratification. This generality is appealing...buecause the user of the program is not required to supply the program with details about the subsample design. The use of these programs requires only that each survey data record contains a code to indicate to which PSU it belongs, together with information about the first-stage stratification."(p78) Do we disagree? Doesn't this cover many of the common designs for large national surveys? to paraphrase Caleb Southworth's question - what are some examples of when a multi-stage design can not be reduced to one stage using this assumption ----- Original Message ----- From: "Caleb Southworth" <caleb@twinky.uoregon.edu> To: <statalist@hsphsun2.harvard.edu> Sent: Tuesday, February 04, 2003 10:35 PM Subject: st: RE: Using STATA to analyze complex national datasets > On Mon, 3 Feb 2003, Sayer, Bryan wrote: > > :documentation for them. Basically, the answer is no. Stata does only one > :level of sample design. So unless you can reduce a more complex sample > :design down to one level, it is not possible in Stata. One issue in > :simplifying the sample design is that you can get increased variability in > :the variance. So it isn't as simple as just using the highest level. > :Perhaps if enough people lean on NCHS, they might come up with something. > > I think Bryan does an excellent job of raising the question: When can a > two-stage design be reduced to one-level? A cursory search of the web > shows lots of users collapsing two-stage designs into clusters and strata, > i.e. > http://www.williams.edu/Mathematics/courses/Math443/stataview/stata.part1/no de2.html > My point here is not to single out a particular course webpage, but rather > to highlight what appears to be a gernal problem. > > I don't know the NCHS data to which Joe refers, but I see this sort of > problem all the time in the Russian Longitudinal Monitoring Survey (RLMS): > analysts either ignore one level of clustering or treat a cluster as a > strata. RLMS has a two-stage design in which it first selects geographic > regions and then selects households. All adult members of the household > are interviewed. So the question is: what is the implication of analyzing > data from a two-stage cluster sample as cluster and strata? Or in STATA > > svyset strata region > svyset psu household > svyset pweight indwgt > > Another way to ask this question might be: Do strata have to be nested > within clusters? Regions and households are both clusters, i.e. they are > both "sampling unit[s] with which one or more listing units can be > associated" (Levy and Lemeshow 1999, p. 266). Likewise, region would also > seem to be a stratum, as in one of L mutually exclusive and exhaustive > groups from which a simple random sample is drawn (Ibid., p. 121). > > Is this a reasonable way to collapse a two-stage design into one level for > analysis with STATA's survey estimators? > > If the nested nature of the design is crucial, perhaps that could be > addressed with HLM where we have two levels and clustering by households? > > gllamm [individual level variable] , i(region) cluster(household) > > This has the advantage of being able to specify weights at both levels and > have a list of variables that define clusters. Comments? > > Dr. Caleb Southworth, Ph.D > American Council of Learned Societies Research Fellow 2002-03 > Assistant Professor > Department of Sociology > 1291 University of Oregon > Eugene OR 97403 > Work: (541) 346-5034 > Fax: (541) 346-5026 > > > > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: a little help** - Next by Date:
**Re: st: a little help** - Previous by thread:
**st: a little help** - Next by thread:
**st: e(depvar) in xtabond** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |