 From "R.E. De Hoyos" To Subject st: Re: Dealing with strata with singleton PSU's Date Thu, 19 May 2005 14:47:14 +0300

Trish,

If you are using Stata 8 or under, what I would do is the following:

1. Identify the singletons for each of your age/gender cell using Jeff Pitblado's -singleton-
(code provided)

program singleton, sort
version 8
syntax [varlist(numeric default=none)] [if] [in], ///
STRata(varname) gen(name) [ PSU(varname) ]
confirm new var `gen'
marksample touse
if "`psu'" == "" {
tempvar psu
gen `psu' = _n
}
tempvar u
sort `touse' `strata' `psu'
quietly by `touse' `strata' `psu': gen `u' = _n == 1
quietly by `touse' `strata': replace `u' = sum(`u')
quietly by `touse' `strata': replace `u' = cond(`u'[_N] == 1, 1, 0)

quietly replace `u' = . if !`touse'
rename `u' `gen'
end

2. Create a general-singleton identifier:

gen gsingleton = singleton_cell1 | singleton_cell2 | ...

3. Collapse the strata to remove the singletons identified by "gsingleton". This will allow you to have the same stratification for each of your statistics. The pweights must remain the same, since the expanssion factors are associated with the PSU's and these wont change.

I hope this helps,

Rafa
________________________
R.E. De Hoyos
Faculty of Economics
University of Cambridge
CB3 9DE, UK
www.econ.cam.ac.uk/phd/red29/

----- Original Message ----- From: "Trish Gorely" <P.J.Gorely@lboro.ac.uk>
To: <statalist@hsphsun2.harvard.edu>
Sent: Thursday, May 19, 2005 2:30 PM
Subject: st: Dealing with strata with singleton PSU's

I have a stratified data set that I want to calculate means and proportions
for using svymean and svyprop. Unfortunately I have some
strata with single PSU's and svymean and svyprop don't like this. The
manual and help service recommend 2 ways of dealing with the singleton
PSU's:
1. collapse across strata to effectively remove them (the advice being to
collapse in the way that makes most sense for your data)
2. drop the singleton PSU's

The preferred option for me is to collapse across strata and I can do this
easily enough. However I'm still not clear on the following:

1. do you need to recalculate probability weights?
2. Do you need to use the same collapsed strata for everyone? For example,
when I do svymean for Grade 9 boys I have 3 singelton PSU but when I do the
same analysis for Grade 10 boys there are 4 singleton PSU's, and at grade 11
7! The problem is much less in girls (grade 9 there is one, grade 10 1 and
grade11 3). Should I collapse to remove the singleton's at year 11 boys
(which would, by chance have the net effect of removing all the singletons
at all year/gender groups) calling the new strata NEWSTRA, and then use
NEWSTRA to define the data for all analyses, or should I be doing the
relevant collapse for each age/gender group?

Thanks for any help anyone can offer
Trish

