# Re: st: Definition of strata and PSUs when svysetting

 From Marcello Pagano To statalist@hsphsun2.harvard.edu Subject Re: st: Definition of strata and PSUs when svysetting Date Tue, 08 Apr 2008 08:20:12 -0400

Steven,

I think weights calculation is clear for me now.

Unfortunately, the market research company which carried out the sampling design didn't do it with replacement (every PSU could be selected just once). I do not know if this can distort results when the sampling fraction of PSUs per stratum is high (more than one third in some strata in our case).

To summarize, the proposed svysettings in this case would be:

svyset censustract [pweight=pondef], strata(area) fpc(#censustractsinarea)

With pondef calculated as:

pondef= (M0 x Mi-real x # hh members)/(K x Mi-intended x # dwellings interviewed).

Where # dwellings interviewed is a number between 10 and 12.

Would that be correct?

Ángel

-----Mensaje original-----

De: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Steven Samuels Enviado el: jueves, 03 de abril de 2008 22:44

Para: statalist@hsphsun2.harvard.edu

Asunto: Re: st: Definition of strata and PSUs when svysetting

Angel,

Let me back up a bit. You were correct in your original conception, but a bit imprecise in phrasing it. I jumped to assumed you were speaking about a no-stratum setup.

Let me correct your misconceptions and answer your original question.

You are dealing with sampling with replacement

Let Mi be the advance measure of size for a PSU. You stated that "

Census tracts were randomly selected with probabilities proportional to the number of dwellings in them: (#PSUs x #dwellings in PSUi)/ dwellings in all PSUs. I write these as K = Mi/Mo."

This was imprecise or incorrect on three counts.

1. The quantities K and M0 should be stratum specific. So the description 'all PSU's' should be 'all PSU's in the stratum.'

2. K should not be the number of distinct PSU's in a stratum but the number of random number draws. As you are sampled with replacement, a PSU may appear more than one time in the sample. Whenever this happens, the standard procedure is to draw a different sample of 12 dwellings.

3. K x Mi/Mo is not a probability, but it may be close to one.

Zi = Mi/Mo is the draw-by-draw probability of selecting PSU i into the sample. K x Zi is the expected number of appearances of the PSU i in the sample. You may have been thinking of K x Zi as the 'inclusion probability', which is the probability that the PSU would appear in the sample. The inclusion probability is 1 - (1 - Zi)^K, and will be close to K x Zi for small Zi and K.

However your basic idea was correct. In sampling with replacement, Wi = 1/(K x Zi) does function as a proper sampling weight The hallmark of a sampling weight is that the sample sum of weigh.ts should estimate the number of population PSU's. and Wi does that.

You will need different K's for different strata, so K will not cancel out in numerator and denominator for stratified designs. And, of course, you need the Zi, not only the Mi.

-Steven

-Steven

On Apr 3, 2008, at 7:22 AM, Angel Rodriguez Laso wrote:

``` Steven,
```

``` Sorry for insisting on this weights matter but, if you need only
```

``` relative weights, why Z (number of dwellings in all census tracts that
```

``` is constant for all individuals in the stratum) is included in the
```

``` design weights formula?
```

``` Ángel
```

``` -----Mensaje original-----
```

``` De: owner-statalist@hsphsun2.harvard.edu
```

``` [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Steven
```

``` Samuels Enviado el: miércoles, 02 de abril de 2008 23:29
```

``` Para: statalist@hsphsun2.harvard.edu
```

``` Asunto: Re: st: Definition of strata and PSUs when svysetting
```

``` Angel, for sampling with replacement, the probability of selection is
```

``` zi = Mi/M0, where Mi the measure of size for PSU i and M0 is the
```

``` total of the M's over PSU's. . The hallmark of probabilities is
```

``` that they add to 1 over the population, and this is true of the Zi's.
```

``` You need to multiply zi by K= the number of PSU's (in a
```

``` stratum) only in the formula for estimating sample totals. See WG
```

``` Cochran, Sampling Techniques 3rd ED, Wiley Books, 1977, p. 252. For
```

``` estimating means, proportions, correlations, regression coefficients,
```

``` only relative weights are needed and K is not needed.
```

``` -Steven
```

``` On Apr 2, 2008, at 4:25 AM, Angel Rodriguez Laso wrote:
```

```> Steven,
```

```>
```

```> In the formula you give for the current sample weight for an
```

```> interviewed person, shouldn't the number of PSUs chosen in the sample
```

```> design be included in the denominator?
```

```>
```

```> I say so because the selection probability of a PSUi is:
```

```>
```

```> #PSUs in the sample design x (#dwellings in PSUi)/ dwellings in all
```

```> PSUs)
```

```>
```

```> And being the weight the inverse of the selection probabilities,
```

```> #PSUs would go to the denominator.
```

```>
```

```>
```

```> PS The list of dwellings per census tract was very up-to-date and
```

```> only very minor changes in the actual measure of size were expected.
```

```>
```

```>
```

```> Ángel
```

```>
```

```>
```

```> -----Mensaje original-----
```

```> De: owner-statalist@hsphsun2.harvard.edu
```

```> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Steven
```

```> Samuels Enviado el: martes, 01 de abril de 2008 23:47
```

```> Para: statalist@hsphsun2.harvard.edu
```

```> Asunto: Re: st: Definition of strata and PSUs when svysetting
```

```>
```

```>
```

```> Angel,
```

```>
```

```> As as only one person was taken per household. you were quite right
```

```> to exclude the dwelling stage in your -svyset- command.
```

```>
```

```> I am not sure that your weights are correct. You state that your
```

```> weighting computation simplifies, because the number of dwelling
```

```> units in a census tract cancels out in numerator and denominator.
```

```> Yet rarely does the advance measure of size for a PSU match the
```

```> actual measure of size. (L Kish, Survey Sampling, Wiley Books, 1965,
```

```> p. 239)
```

```>
```

```> Let Z be your advance count of the number of dwellings in all census
```

```> tracts. If you anticipated 200 dwellings in a sampled census tract,
```

```> you selected the tract with probability equal to 200/Z. Suppose when
```

```> you got to the census tract, you discovered the actual number of
```

```> dwellings was 210. Your target number of dwellings is 12. If you
```

```> maintain the intended probability of 12/200 (so that the 200 cancels
```

```> in the weight computation), the attained sample size will be random,
```

```> n= 12 or 13). (Kish, p. 239). If you select exactly 12 dwellings,
```

```> with probability 12/210, your current sampling weight for an
```

```> interviewed person (Z x (# hh members)/12 should be multiplied by
```

```> 210/200.
```

```>
```

```> This assumes that you obtained interviews in all 12 selected
```

```> dwellings. If you reached the quota of 7 younger and 3 older people
```

```> after interviewing in n = 10 or 11 dwellings, I suggest that you
```

```> change '12' in the weight computation to the value of n.
```

```>
```

```> -Steven
```

```>
```

```>
```

```> On Apr 1, 2008, at 3:57 AM, Angel Rodriguez Laso wrote:
```

```>
```

```>> Steven,
```

```>>
```

```>> 1. Because only one person was interviewed in each dwelling, I don't
```

```>> see the need to include a third stage in the design (there is no
```

```>> clustering of individuals by dwelling, only by census tract).
```

```>>
```

```>> 2. I agree with dropping the age stratum.
```

```>>
```

```>> 3. I appreciate your advice on oversampling of the elderly. When
```

```>> listing and selecting separately younger and elderly people in each
```

```>> dwelling, I see the need to include the dwelling variable, because
```

```>> then you can have two participants living in the same dwelling.
```

```>>
```

```>> 4. and 5. Census tracts were randomly selected with probabilities
```

```>> proportional to the number of dwellings in them:
```

```>>
```

```>> (#PSUs x #dwellings in PSUi)/ dwellings in all PSUs.
```

```>>
```

```>> As probability of selection of each dwelling is:
```

```>>
```

```>> 12/#dwellings in PSUi,
```

```>>
```

```>> #dwellings in PSUi cancels out and the result of these two
```

```>> components of the weight is constant for all individuals in the
```

```>> stratum and can be dropped.
```

```>> The only weights used were then: a) #people in the dwelling; b)
```

```>> post-stratification weights to make age proportions match those of
```

```>> the census.
```

```>>
```

```>> Many thanks for your help.
```

```>>
```

```>> Ángel Rodríguez Laso
```

```>> Institute of Public Health of the Region of Madrid
```

```>>
```

```>>
```

```>> -----Mensaje original-----
```

```>> De: owner-statalist@hsphsun2.harvard.edu
```

```>> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Steven
```

```>> Samuels Enviado el: lunes, 31 de marzo de 2008 19:30
```

```>> Para: statalist@hsphsun2.harvard.edu
```

```>> Asunto: Re: st: Definition of strata and PSUs when svysetting
```

```>>
```

```>>
```

```>>
```

```>> Angel
```

```>>
```

```>> "Gender" in point 2 should have been "age"-fixed below. I apologize
```

```>> for the confusion.
```

```>>
```

```>> -Steven
```

```>> On Mar 31, 2008, at 9:32 AM, Steven Samuels wrote:
```

```>>
```

```>>
```

```>>>
```

```>>> --
```

```>>>
```

```>>> Angel, you had a three-stage, not a two stage design
```

```>>>
```

```>>> 1. The proper -svyset- should include the stage of selecting
```

```>>> dwellings.
```

```>>>
```

```>>> -svyset censustract [pweight=???], strata(area) || dwelling || _n
```

```>>>
```

```>>> For the proper pweight, see point 4 below.
```

```>>>
```

```>>> 2. You did not really stratify on AGE, so drop all reference to an
```

```>>> AGE stratum.
```

```>>>
```

```>>> 3. Your design, selecting one person at random, and hoping to get
```

```>>> enough elderly people, is not one I recommend. There are standard
```

```>>> approaches for oversampling sub-populations in household surveys.
```

```>>> At the least, one can list older and younger people in each
```

```>>> dwelling and select separately from each list.
```

```>>>
```

```>>> 4. The design makes it very difficult to calculate the sampling
```

```>>> weights. You appear to be saying that you stopped interviewing
```

```>>> when you had enough elderly and younger people ( or when you ran
```

```>>> out of dwellings). This is a version of 'sequential sampling'
```

```>>> (Sharon Lohr, Sampling: Design and Analysis, Duxbury, p.
```

```>>> 403)
```

```>>>
```

```>>> Here are my best guesses at sample weights.
```

```>>>
```

```>>> 4a. person weight =
```

```>>> 1/(prob sel tract) x (no. dwellings in tract)/(no. of dwellings
```

```>>> where you obtained interviews) x (no. of people in the person's
```

```>>> dwelling)
```

```>>>
```

```>>> 4b. If you listed the ages of all people in the 12 selected
```

```>>> dwellings, not just those where you did interviewed, you can do
```

```>>> more:
```

```>>>
```

```>>> weight for younger person =
```

```>>> 1/(prob sel tract) x (no. dwellings in tract)/12 x (no. younger
```

```>>> people in the 12 sampled dwellings)/(no. of younger people
```

```>>> interviewed)
```

```>>>
```

```>>>
```

```>>> weight for older person =
```

```>>> 1/(prob sel tract) x (no. dwellings in tract)/12 x (no. older
```

```>>> people in the 12 sampled dwellings)/(no. of older people
```

```>>> interviewed)
```

```>>>
```

```>>> 4c. If you have ages of all people in the sampled dwellings,
```

```>>> substitute 'no. of dwellings where you obtained interviews' for
```

```>>> '12 sampled dwellings' in the formulas in 4b. These weights may
```

```>>> slightly over-estimate the proportion of elderly people.
```

```>>>
```

```>>> 5. If there are census figures available for your target
```

```>>> population, apply a post-stratification weighting to make the
```

```>>> ratio of 'elderly' and 'younger' people match that in the census.
```

```>>> See Lohr, Chapter 8.
```

```>>>
```

```>>> -Steven
```

```>>>
```

```>>>
```

```>>> On Mar 31, 2008, at 6:27 AM, Angel Rodriguez Laso wrote:
```

```>>>
```

```>>>
```

```>>>> Thank you, Steven, for your interest.
```

```>>>>
```

```>>>> Answering to your questions, I didn’t go into more details on the
```

```>>>> sampling procedure because I didn’t think they were needed for the
```

```>>>> definition of strata and PSUs. There was intermediate sampling of
```

```>>>> dwellings.
```

```>>>> There was a
```

```>>>> list of all dwellings in census tracts and from this list 12
```

```>>>> dwellings in each selected census tract were chosen at random.
```

```>>>> From each dwelling one person was taken at random (and his/her
```

```>>>> weight calculated from the number of people living in the
```

```>>>> dwelling). People were interviewed until a sample of 7 bellow 65
```

```>>>> and 3 over 65 was obtained in each census tract. The reason why 12
```

```>>>> dwellings were selected initially is that it was expected that
```

```>>>> taking only 10 would not yield the final 7/3 proportion desired.
```

```>>>> Nevertheless, not in all census tracts 7 and 3 individuals could
```

```>>>> be selected and that's the reason (more than the existence of
```

```>>>> missing items) why there are census tracts with only one
```

```>>>> individual over 65.
```

```>>>>
```

```>>>> I'm trying to check if following your advice (merging strata in
```

```>>>> single PSU per stratum census tracts) or just dropping the second
```

```>>>> stage specification, would give very different results, but when I
```

```>>>> run a svy: prop under the first specification:
```

```>>>>
```

```>>>> svyset censustract [pweight=pondef], strata(area) fpc
```

```>>>> (#censustractsinarea)|| identificationvariable,
```

```>>>> strata(agegroupscorrected)
```

```>>>>
```

```>>>> I get the message: 'Missing standard error due to stratum with
```

```>>>> single sampling unit; see help svydes.', but when I
```

```>>>>
```

```>>>> svydes variable, single stage(2)
```

```>>>>
```

```>>>> no single PSUs are displayed. Do you know why?
```

```>>>>
```

```>>>>
```

```>>>> Ángel Rodríguez Laso
```

```>>>> Institute of Public Health of the Region of Madrid
```

```>>>>
```

```>>>> -----Mensaje original-----
```

```>>>> De: owner-statalist@hsphsun2.harvard.edu
```

```>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Steven
```

```>>>> Samuels Enviado el: viernes, 28 de marzo de 2008 22:25
```

```>>>> Para: statalist@hsphsun2.harvard.edu
```

```>>>> Asunto: Re: st: Definition of strata and PSUs when svysetting
```

```>>>>
```

```>>>>
```

```>>>> Angel-
```

```>>>> I'm sorry that I missed your initial post; I was on vacation and
```

```>>>> canceled my Statalist subscription. I agree with Stas's
```

```>>>> suggestion for the first specification.
```

```>>>>
```

```>>>> I have some questions
```

```>>>>
```

```>>>> 1. Your description implies that you created a list of ALL people
```

```>>>> in each selected tract, stratified by age. Then selected by
```

```>>>> simple random sampling: 7 from the below 65 list; 3 from the over
```

```>>>> 65 list.
```

```>>>> Is that a correct description? Or, was there intermediate
```

```>>>> sampling of dwellings?
```

```>>>>
```

```>>>> 2. Your PSU's are census tracts, not people. ("Primary" refers
```

```>>>> only to the first stage.) You are saying that in some of the
```

```>>>> census tracts, you had only one person either under or 'over' 65.
```

```>>>> Is that correct?
```

```>>>>
```

```>>>> For those tracts, I suggest that you go with option 1, but
```

```>>>> ignore the stratification, but keep the sampling probabilities.
```

```>>>> That is, create a single stratum for those tracts by recoding.
```

```>>>>
```

```>>>>
```

```>>>> You may still analyze your outcomes by age. The analysis age
```

```>>>> groups
```

```>>>> need not match the stratum age-groups.
```

```>>>>
```

```>>>> -Steven
```

```>>>>
```

```>>>>
```

```>>>> On Mar 28, 2008, at 10:40 AM, Angel Rodriguez Laso wrote:
```

```>>>>
```

```>>>>
```

```>>>>> Thank you for your answer, Stas.
```

```>>>>>
```

```>>>>> I´ve tried both specifications and the first surprise was that
```

```>>>>> Stata 9
```

```>>>>> ignores further stages when stage 1 is sampled with
```

```>>>>> replacement. It
```

```>>>>> was good
```

```>>>>> to come across this warning because in our survey sampling was
```

```>>>>> without
```

```>>>>> replacement and the sampling fraction of the census tracts was
```

```>>>>> quite high
```

```>>>>> (more than one third in some strata) what precludes assuming that
```

```>>>>> selection
```

```>>>>> was with replacement.
```

```>>>>>
```

```>>>>> The problem with using age groups as second stage strata is that
```

```>>>>> being 3 the
```

```>>>>> number of people over 65 selected per census tract, whenever
```

```>>>>> there are
```

```>>>>> missing values in the variables some strata become single-PSU
```

```>>>>> (person)
```

```>>>>> strata, what prevents Stata from calculating standard errors. So,
```

```>>>>> the two
```

```>>>>> specifications I´ve tried are:
```

```>>>>>
```

```>>>>> svyset censustract [pweight=pondef], strata(area) fpc
```

```>>>>> (#censustractsinarea)
```

```>>>>> svyset censustract [pweight=pondef], strata(area-by-age) fpc
```

```>>>>> (#censustractsin
```

```>>>>> area)
```

```>>>>>
```

```>>>>> Not surprisingly standard errors with both specifications differ
```

```>>>>> only in
```

```>>>>> some hundreths. I believe this is mainly due to the fact that in
```

```>>>>> both cases
```

```>>>>> degrees of freedom are very large. This is something I want to
```

```>>>>> check with
```

```>>>>> you: From the reading of Korn and Graubard "Analysis of health
```

```>>>>> surveys" I´ve
```

```>>>>> understood that in complex surveys degrees of freedom are
```

```>>>>> calculated as
```

```>>>>> #PSUs - #strata (624 for the first specification and 1244 for the
```

```>>>>> second,
```

```>>>>> because Stata duplicates the number of census tracts because each
```

```>>>>> of them
```

```>>>>> belongs to two different strata). I do not follow you very well
```

```>>>>> when you
```

```>>>>> recommend doing a small simulation with census or simulated
```

```>>>>> data to
```

```>>>>> ascertain degrees of freedom or when you state that Taylor series
```

```>>>>> expansion
```

```>>>>> standard errors might be badly off with small samples. It´s usual
```

```>>>>> practice
```

```>>>>> to work with such low numbers of individuals per PSU (10 in my
```

```>>>>> case) and
```

```>>>>> I´ve never heard that there was a problem of a small sample size
```

```>>>>> then.
```

```>>>>>
```

```>>>>> Unfortunately, I don´t have enough knowledge to go for option 3.
```

```>>>>>
```

```>>>>> To conclude, although both specifications yield similar
```

```>>>>> results, I
```

```>>>>> agree
```

```>>>>> with you that the second one implies linked selection of PSUs
```

```>>>>> while
```

```>>>>> the
```

```>>>>> first one is conceptually sounder.
```

```>>>>>
```

```>>>>>
```

```>>>>>
```

```>>>>> Ángel Rodríguez Laso
```

```>>>>> Institute of Public Health of the Region of Madrid
```

```>>>>>
```

```>>>>> -----Mensaje original-----
```

```>>>>> De: owner-statalist@hsphsun2.harvard.edu
```

```>>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] En nombre de Stas
```

```>>>>> Kolenikov
```

```>>>>> Enviado el: jueves, 27 de marzo de 2008 20:06
```

```>>>>> Para: statalist@hsphsun2.harvard.edu
```

```>>>>> Asunto: Re: st: Definition of strata and PSUs when svysetting
```

```>>>>>
```

```>>>>>
```

```>>>>> I would say your first specificaiton makes better sense, even
```

```>>>>> though
```

```>>>>> the design it produces is quite weird, and the degrees of
```

```>>>>> freedom in
```

```>>>>> that design are strange (and 7 initial strata won't get you very
```

```>>>>> far,
```

```>>>>> anyway). In Stata 10, that's doable with
```

```>>>>>
```

```>>>>> svyset tract, strata(area) || person, strata(age_group)
```

```>>>>>
```

```>>>>> if I am getting your design right.
```

```>>>>>
```

```>>>>> In the second specification with region by age strata, you have
```

```>>>>> some
```

```>>>>> sort of coupled sampling when selecting a PSU in one stratum
```

```>>>>> implies
```

```>>>>> selecting a certain PSU in the another stratum linked by
```

```>>>>> geography.
```

```>>>>> You could still analyze that, but you would need to get accurate
```

```>>>>> pairwise probabilities of selection to compute Horwitz-Thompson
```

```>>>>> estimator, and Grundy-Yates-Sen estimator of its variance
```

```>>>>> (which I
```

```>>>>> don't think is implemented anywhere commercially as those higher
```

```>>>>> order
```

```>>>>> probabilities of selection are rarely known; Jeff P, that might
```

```>>>>> produce a cutting edge addition to Stata's set of -svy- tools,
```

```>>>>> although I've no idea how to input and parse those :)). Any
```

```>>>>> reasonably
```

```>>>>> high level book would have it (Kish, Cochran, Mary Thompson's
```

```>>>>> books
```

```>>>>> spring to mind). For special cases, I think that can be
```

```>>>>> programmed in
```

```>>>>> Mata. Let's call that option 3. Note that the naive
```

```>>>>> implementation as
```

```>>>>>
```

```>>>>> svyset tract, strata(area X age) || person
```

```>>>>>
```

```>>>>> produces wrong probabilities of selection, and the variances are
```

```>>>>> likely to be understated, as there is more variability in this
```

```>>>>> specification than in your actual design.
```

```>>>>>
```

```>>>>> If I were in your shoes, I would try both specifications you
```

```>>>>> described
```

```>>>>> and see whether they are producing comparable substantive
```

```>>>>> results.
```

```>>>>> Keep in mind that either way you are getting asymptotic Taylor
```

```>>>>> series
```

```>>>>> expansion standard errors, and they might be badly
```

```>>>>> off with small samples like those you have. And I think you
```

```>>>>> need to
```

```>>>>> worry about your degrees of freedom, not your number of PSUs; I
```

```>>>>> would
```

```>>>>> do a small simulation to determine the approximate d.f.s for your
```

```>>>>> main
```

```>>>>> variables -- from census data if you have it, or from simulated
```

```>>>>> data
```

```>>>>> resembling the actual population. If I had infinite time to
```

```>>>>> work on
```

```>>>>> that project (meaning, a week or two of devoted programming), I
```

```>>>>> would
```

```>>>>> implement option 3 as the most proper.
```

```>>>>>
```

```>>>>> On 3/25/08, Angel Rodriguez Laso
```

```>>>>> <angel.rodriguez@salud.madrid.org>
```

```>>>>> wrote:
```

```>>>>>
```

```>>>>>> Greetings to all members of the list,
```

```>>>>>>
```

```>>>>>>
```

```>>>>>>
```

```>>>>>> I have the following questions on svysetting for an analysis
```

```>>>>>> of a
```

```>>>>>> complex
```

```>>>>>> survey:
```

```>>>>>>
```

```>>>>>>
```

```>>>>>> We have carried out a regional health population survey. We
```

```>>>>>> defined
```

```>>>>>>
```

```>>>>> strata
```

```>>>>>
```

```>>>>>> initially as geographic areas in the region (n=7) and allocated
```

```>>>>>> to each
```

```>>>>>>
```

```>>>>> of
```

```>>>>>
```

```>>>>>> them a sample proportional to their population. But because we
```

```>>>>>> wanted to
```

```>>>>>> over-represent the elderly, we set that the number of people
```

```>>>>>> over 65
```

```>>>>>>
```

```>>>>> years
```

```>>>>>
```

```>>>>>> sampled in all areas had to reach a minimum number. We didn't
```

```>>>>>> change the
```

```>>>>>> sample size of people bellow 65 obtained through the
```

```>>>>>> proportional
```

```>>>>>> allocation. Therefore the sampling fractions (and consequently
```

```>>>>>> the
```

```>>>>>>
```

```>>>>> weights)
```

```>>>>>
```

```>>>>>> are different for each area by age group (bellow/over 65)
```

```>>>>>> category.
```

```>>>>>>
```

```>>>>>> Then we selected census tracts in each geographic area with
```

```>>>>>> probabilities
```

```>>>>>> proportional to their total population, and randomly sampled 10
```

```>>>>>>
```

```>>>>> individuals
```

```>>>>>
```

```>>>>>> in those selected, always keeping the proportion 7 bellow 65
```

```>>>>>> years/3 over
```

```>>>>>>
```

```>>>>> 65
```

```>>>>>
```

```>>>>>> years, which was the regional overall age distribution after
```

```>>>>>> the
```

```>>>>>> oversampling explained above. My first question is if strata
```

```>>>>>> should be
```

```>>>>>> defined as geographic regions alone or as geographic area by
```

```>>>>>> age
```

```>>>>>> groups
```

```>>>>>> (bellow/ over 65 years) (n=14) when svysetting. The first
```

```>>>>>> possibility
```

```>>>>>>
```

```>>>>> looks
```

```>>>>>
```

```>>>>>> more reasonable, because census tracts were selected within
```

```>>>>>> geographic
```

```>>>>>> areas, not within geographic-age groups areas. If this is
```

```>>>>>> correct, then
```

```>>>>>> probably the way to svyset would be declaring geographic
```

```>>>>>> areas as
```

```>>>>>> first
```

```>>>>>> stage strata, census tracts as first stage PSUs and age
```

```>>>>>> groups as
```

```>>>>>> second
```

```>>>>>> stage strata.
```

```>>>>>>
```

```>>>>>> Alternatively, if the answer is that strata should be
```

```>>>>>> defined as
```

```>>>>>> region
```

```>>>>>>
```

```>>>>> by
```

```>>>>>
```

```>>>>>> two age-groups categories, then the same census tract can
```

```>>>>>> belong
```

```>>>>>> to two
```

```>>>>>> different strata (for example area A bellow 65/ area A over 65)
```

```>>>>>> depending
```

```>>>>>>
```

```>>>>> on
```

```>>>>>
```

```>>>>>> the age of the individual considered. If I svyset: strata
```

```>>>>>> (region
```

```>>>>>> by age
```

```>>>>>> group categories) and PSU= census tracts, STATA interprets that
```

```>>>>>> there are
```

```>>>>>> twice the number of PSUs than real census tracts are. Is that
```

```>>>>>> correct?
```

```>>>>>>
```

```>>>>>>
```

```>>>>>>
```

```>>>>>> Many thanks.
```

```>>>>>>
```

```>>>>>>
```

```>>>>>> Ángel Rodríguez Laso
```

```>>>>>> Institute of Public Health of the Region of Madrid
```

```>>>>>>
```

```>>>>>>
```

```>>>>>
```

```>>>>>
```

```>>>>> --
```

>>>>> Stas Kolenikov, also found at http://stas.kolenikov.name
```<http://stas.kolenikov.name/>

```
```>>>>>
```

```>>>>> Small print: Please do not reply to my Gmail address as I don't
```

```>>>>> check
```

```>>>>> it regularly.
```

```>>>>>
```

```>>>>> *
```

```>>>>> * For searches and help try:
```

```>>>>> * http://www.stata.com/support/faqs/res/findit.html
```

```>>>>> * http://www.stata.com/support/statalist/faq
```

```>>>>> * http://www.ats.ucla.edu/stat/stata/
```

```>>>>>
```

```>>>>> _________________________________________________________________
```

```>>>>> _
```

```>>>>> _
```

```>>>>> _
```

```>>>>> _
```

```>>>>> Mensaje analizado y protegido por Telefonica Empresas
```

```>>>>>
```

```>>>>>
```

```>>>>> *
```

```>>>>> * For searches and help try:
```

```>>>>> * http://www.stata.com/support/faqs/res/findit.html
```

```>>>>> * http://www.stata.com/support/statalist/faq
```

```>>>>> * http://www.ats.ucla.edu/stat/stata/
```

```>>>>>
```

```>>>>
```

```>>>>
```

```>>>> *
```

```>>>> * For searches and help try:
```

```>>>> * http://www.stata.com/support/faqs/res/findit.html
```

```>>>> * http://www.stata.com/support/statalist/faq
```

```>>>> * http://www.ats.ucla.edu/stat/stata/
```

```>>>>
```

```>>>> __________________________________________________________________
```

```>>>> _
```

```>>>> _
```

```>>>> _
```

```>>>> Mensaje analizado y protegido por Telefonica Empresas
```

```>>>>
```

```>>>>
```

```>>>> *
```

```>>>> * For searches and help try:
```

```>>>> * http://www.stata.com/support/faqs/res/findit.html
```

```>>>> * http://www.stata.com/support/statalist/faq
```

```>>>> * http://www.ats.ucla.edu/stat/stata/
```

```>>>>
```

```>>>
```

```>>>
```

```>>
```

```>> Steven Samuels
```

```>> 845-246-0774
```

```>> 18 Cantine's Island
```

```>> Saugerties, NY 12477
```

```>> EFax: 208-498-7441
```

```>>
```

```>>
```

```>>
```

```>>
```

```>>
```

```>> *
```

```>> * For searches and help try:
```

```>> * http://www.stata.com/support/faqs/res/findit.html
```

```>> * http://www.stata.com/support/statalist/faq
```

```>> * http://www.ats.ucla.edu/stat/stata/
```

```>>
```

```>> ____________________________________________________________________
```

```>> _
```

```>> Mensaje analizado y protegido por Telefonica Empresas
```

```>>
```

```>>
```

```>> *
```

```>> * For searches and help try:
```

```>> * http://www.stata.com/support/faqs/res/findit.html
```

```>> * http://www.stata.com/support/statalist/faq
```

```>> * http://www.ats.ucla.edu/stat/stata/
```

```>
```

```>
```

```> *
```

```> * For searches and help try:
```

```> * http://www.stata.com/support/faqs/res/findit.html
```

```> * http://www.stata.com/support/statalist/faq
```

```> * http://www.ats.ucla.edu/stat/stata/
```

```>
```

```> _____________________________________________________________________
```

```> Mensaje analizado y protegido por Telefonica Empresas
```

```>
```

```>
```

```> *
```

```> * For searches and help try:
```

```> * http://www.stata.com/support/faqs/res/findit.html
```

```> * http://www.stata.com/support/statalist/faq
```

```> * http://www.ats.ucla.edu/stat/stata/
```

``` *
```

``` * For searches and help try:
```

``` * http://www.stata.com/support/faqs/res/findit.html
```

``` * http://www.stata.com/support/statalist/faq
```

``` * http://www.ats.ucla.edu/stat/stata/
```

``` _____________________________________________________________________
```

``` Mensaje analizado y protegido por Telefonica Empresas
```

``` *
```

``` * For searches and help try:
```

``` * http://www.stata.com/support/faqs/res/findit.html
```

``` * http://www.stata.com/support/statalist/faq
```

``` * http://www.ats.ucla.edu/stat/stata/
```
```*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

_____________________________________________________________________

Mensaje analizado y protegido por Telefonica Empresas

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```