[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Definition of strata and PSUs when svysetting |

Date |
Fri, 28 Mar 2008 17:25:02 -0400 |

Angel-

I'm sorry that I missed your initial post; I was on vacation and canceled my Statalist subscription. I agree with Stas's suggestion for the first specification.

I have some questions

1. Your description implies that you created a list of ALL people in each selected tract, stratified by age. Then selected by simple random sampling: 7 from the below 65 list; 3 from the over 65 list. Is that a correct description? Or, was there intermediate sampling of dwellings?

2. Your PSU's are census tracts, not people. ("Primary" refers only to the first stage.) You are saying that in some of the census tracts, you had only one person either under or 'over' 65. Is that correct?

For those tracts, I suggest that you go with option 1, but ignore the stratification, but keep the sampling probabilities. That is, create a single stratum for those tracts by recoding.

You may still analyze your outcomes by age. The analysis age groups need not match the stratum age-groups.

-Steven

On Mar 28, 2008, at 10:40 AM, Angel Rodriguez Laso wrote:

Thank you for your answer, Stas.

I´ve tried both specifications and the first surprise was that Stata 9

ignores further stages when stage 1 is sampled with replacement. It was good

to come across this warning because in our survey sampling was without

replacement and the sampling fraction of the census tracts was quite high

(more than one third in some strata) what precludes assuming that selection

was with replacement.

The problem with using age groups as second stage strata is that being 3 the

number of people over 65 selected per census tract, whenever there are

missing values in the variables some strata become single-PSU (person)

strata, what prevents Stata from calculating standard errors. So, the two

specifications I´ve tried are:

svyset censustract [pweight=pondef], strata(area) fpc (#censustractsinarea)

svyset censustract [pweight=pondef], strata(area-by-age) fpc (#censustractsin

area)

Not surprisingly standard errors with both specifications differ only in

some hundreths. I believe this is mainly due to the fact that in both cases

degrees of freedom are very large. This is something I want to check with

you: From the reading of Korn and Graubard "Analysis of health surveys" I´ve

understood that in complex surveys degrees of freedom are calculated as

#PSUs - #strata (624 for the first specification and 1244 for the second,

because Stata duplicates the number of census tracts because each of them

belongs to two different strata). I do not follow you very well when you

recommend doing a small simulation with census or simulated data to

ascertain degrees of freedom or when you state that Taylor series expansion

standard errors might be badly off with small samples. It´s usual practice

to work with such low numbers of individuals per PSU (10 in my case) and

I´ve never heard that there was a problem of a small sample size then.

Unfortunately, I don´t have enough knowledge to go for option 3.

To conclude, although both specifications yield similar results, I agree

with you that the second one implies linked selection of PSUs while the

first one is conceptually sounder.

Ángel Rodríguez Laso

Institute of Public Health of the Region of Madrid

-----Mensaje original-----

De: owner-statalist@hsphsun2.harvard.edu

[mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Stas Kolenikov

Enviado el: jueves, 27 de marzo de 2008 20:06

Para: statalist@hsphsun2.harvard.edu

Asunto: Re: st: Definition of strata and PSUs when svysetting

I would say your first specificaiton makes better sense, even though

the design it produces is quite weird, and the degrees of freedom in

that design are strange (and 7 initial strata won't get you very far,

anyway). In Stata 10, that's doable with

svyset tract, strata(area) || person, strata(age_group)

if I am getting your design right.

In the second specification with region by age strata, you have some

sort of coupled sampling when selecting a PSU in one stratum implies

selecting a certain PSU in the another stratum linked by geography.

You could still analyze that, but you would need to get accurate

pairwise probabilities of selection to compute Horwitz-Thompson

estimator, and Grundy-Yates-Sen estimator of its variance (which I

don't think is implemented anywhere commercially as those higher order

probabilities of selection are rarely known; Jeff P, that might

produce a cutting edge addition to Stata's set of -svy- tools,

although I've no idea how to input and parse those :)). Any reasonably

high level book would have it (Kish, Cochran, Mary Thompson's books

spring to mind). For special cases, I think that can be programmed in

Mata. Let's call that option 3. Note that the naive implementation as

svyset tract, strata(area X age) || person

produces wrong probabilities of selection, and the variances are

likely to be understated, as there is more variability in this

specification than in your actual design.

If I were in your shoes, I would try both specifications you described

and see whether they are producing comparable substantive results.

Keep in mind that either way you are getting asymptotic Taylor series

expansion standard errors, and they might be badly

off with small samples like those you have. And I think you need to

worry about your degrees of freedom, not your number of PSUs; I would

do a small simulation to determine the approximate d.f.s for your main

variables -- from census data if you have it, or from simulated data

resembling the actual population. If I had infinite time to work on

that project (meaning, a week or two of devoted programming), I would

implement option 3 as the most proper.

On 3/25/08, Angel Rodriguez Laso <angel.rodriguez@salud.madrid.org> wrote:

Greetings to all members of the list,

I have the following questions on svysetting for an analysis of a complex

survey:

We have carried out a regional health population survey. We defined

stratainitially as geographic areas in the region (n=7) and allocated to each

ofthem a sample proportional to their population. But because we wanted to

over-represent the elderly, we set that the number of people over 65

yearssampled in all areas had to reach a minimum number. We didn't change the

sample size of people bellow 65 obtained through the proportional

allocation. Therefore the sampling fractions (and consequently the

weights)are different for each area by age group (bellow/over 65) category.

Then we selected census tracts in each geographic area with probabilities

proportional to their total population, and randomly sampled 10

individualsin those selected, always keeping the proportion 7 bellow 65 years/3 over

65years, which was the regional overall age distribution after the

oversampling explained above. My first question is if strata should be

defined as geographic regions alone or as geographic area by age groups

(bellow/ over 65 years) (n=14) when svysetting. The first possibility

looksmore reasonable, because census tracts were selected within geographic

areas, not within geographic-age groups areas. If this is correct, then

probably the way to svyset would be declaring geographic areas as first

stage strata, census tracts as first stage PSUs and age groups as second

stage strata.

Alternatively, if the answer is that strata should be defined as region

bytwo age-groups categories, then the same census tract can belong to two

different strata (for example area A bellow 65/ area A over 65) depending

onthe age of the individual considered. If I svyset: strata (region by age

group categories) and PSU= census tracts, STATA interprets that there are

twice the number of PSUs than real census tracts are. Is that correct?

Many thanks.

Ángel Rodríguez Laso

Institute of Public Health of the Region of Madrid

-- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: Please do not reply to my Gmail address as I don't check it regularly. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ _____________________________________________________________________ Mensaje analizado y protegido por Telefonica Empresas * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Definition of strata and PSUs when svysetting***From:*"Angel Rodriguez Laso" <angel.rodriguez@salud.madrid.org>

**References**:**RE: st: Definition of strata and PSUs when svysetting***From:*"Angel Rodriguez Laso" <angel.rodriguez@salud.madrid.org>

- Prev by Date:
**st: Adding variable from baseline file into cohort based file** - Next by Date:
**st: mfx and margeff calculations** - Previous by thread:
**RE: st: Definition of strata and PSUs when svysetting** - Next by thread:
**RE: st: Definition of strata and PSUs when svysetting** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |