[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Stas Kolenikov" <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Definition of strata and PSUs when svysetting |

Date |
Thu, 27 Mar 2008 14:06:05 -0500 |

I would say your first specificaiton makes better sense, even though the design it produces is quite weird, and the degrees of freedom in that design are strange (and 7 initial strata won't get you very far, anyway). In Stata 10, that's doable with svyset tract, strata(area) || person, strata(age_group) if I am getting your design right. In the second specification with region by age strata, you have some sort of coupled sampling when selecting a PSU in one stratum implies selecting a certain PSU in the another stratum linked by geography. You could still analyze that, but you would need to get accurate pairwise probabilities of selection to compute Horwitz-Thompson estimator, and Grundy-Yates-Sen estimator of its variance (which I don't think is implemented anywhere commercially as those higher order probabilities of selection are rarely known; Jeff P, that might produce a cutting edge addition to Stata's set of -svy- tools, although I've no idea how to input and parse those :)). Any reasonably high level book would have it (Kish, Cochran, Mary Thompson's books spring to mind). For special cases, I think that can be programmed in Mata. Let's call that option 3. Note that the naive implementation as svyset tract, strata(area X age) || person produces wrong probabilities of selection, and the variances are likely to be understated, as there is more variability in this specification than in your actual design. If I were in your shoes, I would try both specifications you described and see whether they are producing comparable substantive results. Keep in mind that either way you are getting asymptotic Taylor series expansion standard errors, and they might be badly off with small samples like those you have. And I think you need to worry about your degrees of freedom, not your number of PSUs; I would do a small simulation to determine the approximate d.f.s for your main variables -- from census data if you have it, or from simulated data resembling the actual population. If I had infinite time to work on that project (meaning, a week or two of devoted programming), I would implement option 3 as the most proper. On 3/25/08, Angel Rodriguez Laso <angel.rodriguez@salud.madrid.org> wrote: > Greetings to all members of the list, > > > > I have the following questions on svysetting for an analysis of a complex > survey: > > > We have carried out a regional health population survey. We defined strata > initially as geographic areas in the region (n=7) and allocated to each of > them a sample proportional to their population. But because we wanted to > over-represent the elderly, we set that the number of people over 65 years > sampled in all areas had to reach a minimum number. We didn't change the > sample size of people bellow 65 obtained through the proportional > allocation. Therefore the sampling fractions (and consequently the weights) > are different for each area by age group (bellow/over 65) category. > > Then we selected census tracts in each geographic area with probabilities > proportional to their total population, and randomly sampled 10 individuals > in those selected, always keeping the proportion 7 bellow 65 years/3 over 65 > years, which was the regional overall age distribution after the > oversampling explained above. My first question is if strata should be > defined as geographic regions alone or as geographic area by age groups > (bellow/ over 65 years) (n=14) when svysetting. The first possibility looks > more reasonable, because census tracts were selected within geographic > areas, not within geographic-age groups areas. If this is correct, then > probably the way to svyset would be declaring geographic areas as first > stage strata, census tracts as first stage PSUs and age groups as second > stage strata. > > Alternatively, if the answer is that strata should be defined as region by > two age-groups categories, then the same census tract can belong to two > different strata (for example area A bellow 65/ area A over 65) depending on > the age of the individual considered. If I svyset: strata (region by age > group categories) and PSU= census tracts, STATA interprets that there are > twice the number of PSUs than real census tracts are. Is that correct? > > > > Many thanks. > > > Ángel Rodríguez Laso > Institute of Public Health of the Region of Madrid > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: Please do not reply to my Gmail address as I don't check it regularly. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Definition of strata and PSUs when svysetting***From:*"Angel Rodriguez Laso" <angel.rodriguez@salud.madrid.org>

**References**:**st: Definition of strata and PSUs when svysetting***From:*"Angel Rodriguez Laso" <angel.rodriguez@salud.madrid.org>

- Prev by Date:
**st: -ksmirnov- for uniform distribution?** - Next by Date:
**st: -postfile, every()-?** - Previous by thread:
**st: Definition of strata and PSUs when svysetting** - Next by thread:
**RE: st: Definition of strata and PSUs when svysetting** - Index(es):

© Copyright 1996–2019 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |