Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: How to svyset when strata are used in some groups and not others

 From "Louise Linsell" To Subject Re: st: How to svyset when strata are used in some groups and not others Date Mon, 05 Jul 2010 16:17:47 +0100

```This is the complete design for the partially stratified dataset:

There are 4 types of hospital and (for example) we are testing the hypothesis that mean age is equal across hospital type.

For the first type of hospital, we divided the 180 national units into 6 strata (North/South x large/medium/small size) and selected 37 units (with the probability of selection proportional to size within strata).

For the other 3 types of hospital we selected all national units.

We then sampled patients for d consecutive days, where d varied by unit.

The commands we have used so far are:

svyset hospid [pweight=weight], strata(strata) fpc(strata_fract)
svy: mean age, over(hosptype)

Where:

hospid = hospital identifier 1...435
weight = probability sampling weight (number of days recruited in unit/number of days recruited in units of same hospital type)
strata = strata number 1...9 (1-6 for strata within 1st hospital type,7 for 2nd hospital type, 8 for 3rd hospital type and 9 for 4th hospital type)
strata_frac = n/N - number of units selected in stratum/total number of units in stratum (=1 for last 3 types of hospital)
age = patient age in years
hosptype = type of hospital 1...4

When this model is fitted we get zero estimates for the standard errors in the last 3 types of hospital.
I think this is because strata_frac=1 for these hospitals, so the model thinks we have sampled the whole population,
when in fact we have just sampled a number of consecutive days. I was thinking about specifying a second level of
sampling - number of days sampled out of one whole year and setting fpc's for the secondary sampling units (days).

LL

>>> Steve Samuels <sjsamuels@gmail.com> 05/07/2010 12:42 >>>

1. the complete design, including subsequent stages of sampling
2.  the purposes of the analyses--descriptive?  estimating regression
coefficients?  testing hypotheses?

What -svyset- commands have you tried to issue so far?

Steve

On Mon, Jul 5, 2010 at 5:06 AM, Louise Linsell
<Louise.Linsell@npeu.ox.ac.uk> wrote:
> Thank you for suggestions. We have already tried defining 9 strata; 6 for the common type of hospital, for which we used stratified random sampling with 6 strata,  and 1 stratum each for the other 3 types of hospital, for which we took all units.
>
> However, in the model we had to specify a finite population correction (FPC=sqrt(1-n/N)) as we sampled 28 out of 87 units for the most common type of hospital.
>
> Because we sampled ALL the units from the other 3 types of hospital we had to set the FPC to zero since n=N (which is specified as 1 in Stata as it requires you to specify n/N). This means that there are no variance estimates when we summarise any outcomes in the 3 less common types of hospital, because it thinks we have sampled the whole population within these hospitals (when in fact we took a consecutive number of patients over a period of 3 months).
>
> LL
>
>>>> Stas Kolenikov <skolenik@gmail.com> 02/07/2010 20:36 >>>
> If Louise sampled other 3 types lumping them together, then Steve's
> recommendation is appropriate. If sampling was performed within each
> of those remaining types, then the strata variable will have 6 (strata
> in the most common type of hospitals) + 3 (other types of hospitals) =
> 9 levels.
>
> On Fri, Jul 2, 2010 at 11:18 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> Louise-- create a stratum variable with 7 values: 1-6 for the
>> hospitals of the first type, and 7 for the other three types, and use
>> that in the strata() option of -svyset-
>>
>> Steve
>>
>> On Fri, Jul 2, 2010 at 12:00 PM, Louise Linsell
>> <Louise.Linsell@npeu.ox.ac.uk> wrote:
>>> I have a dataset with 4 different types of hospital, and would like to compare binary outcomes between them using logistic regression.  However for the first type  (the most common), hospitals were divided into 6 strata (based on size and SES) and a random sample was taken from each strata.  For the other 3 types of hospital we sampled all hospitals. My question is, how to use the svyset command when a different sampling strategy was used in one group?
>>>
>>> LL
>>>
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>>
>> --
>> Steven Samuels
>> sjsamuels@gmail.com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```