Ben M. Gramig <gramigbe@msu.edu> has a follow-up question regarding -svyset-
for his dataset's survey design:
> In my data I have strata (states) and sub-strata (farm size classes, i.e.
> number livestock). There are FPC data at the sub-strata level but not for
> the initial stratification (I think I can construct these from the FPCs for
> the sub-strata-- simply sum all the FPCs and divide by the number of
> observations in the strata?). Based on the definition of PSUs in Stata it
> seems to me that it is incorrect to called these sub-strata PSUs(clusters)
> because this is not a cluster sample, but sampling within the sub-strata is
> w/o replacement.
>
> As a result of this survey design/structure, Stata 8 gives an error when I
> try to run a regression using either svy or other commands followed by
> -suest- as you've so helpfully pointed out to me. I believe the Stata SVY
> manual indicates that FPC is not intended for use where there is sub
> sampling within the PSU (which I interpret as about the same thing as what I
> have where there is sampling of individual population elements within the
> sub-strata).
>
> The error reads: "fpc for all observations within a stratum must be the
> same".
>
> Is there a way to overcome this error in Stata 8? I notice that Stata 9
> seems to allow for more complex designs with multiple layers of
> stratification but it seems to require FPCs for each "layer".
Ben's survey design isn't fully specified here, but I'll work with what was
supplied. If states are strata, then the PSU's were sampled independently
within each state; we are not told what the PSU's are.
1. If the PSU's are farms, and farms are further sampled within the
sub-strata, then for all intents and purposes the state/sub-strata
combinations make up the actual strata and the sub-strata FPC values can
be used as-is. Suppose Ben's dataset has a variable called -state- that
identifies the strata and -farm_size- that identifies the sub-strata, then
Ben can use -egen- to generate a strata variable to be consumed by
-svyset- using:
. egen strata = group(state farm_size)
This is the way to go in Stata 8 and Stata 9.
2. If the PSU's were something larger than farms, say county, then Ben will
need a variable to identify the PSU's.
In Stata 8, only the first stage information can be specified. Ben should
forget about trying to specify an FPC if the units sampled within the
sub-strata are smaller than the PSUs, farms seem most likely. Specifying
an FPC in this case will result in variance estimates that under-represent
the sample-to-sample variability for this design.
In Stata 9, multi-stage designs can be specified. Ben will need an FPC
for the first stage in order to specify the second stage. However, I
don't think it is valid to sum the FPC values from the sub-strata to get
the first stage FPC. -svyset- will accept sampling fractions or counts in
the -fpc()- option. If the FPC information is sampling fractions, summing
them up yields nothing useful. If the FPC information is counts, these
counts represent the number of sampling units for the given stage present
in the population for the corresponding stratum. We are assuming the
PSU's are collections of farms (otherwise see 1 above), so summing up the
sub-strata FPC values within the strata would not yield the correct within
stratum population PSU counts.
If the PSU's are sampled with replacement, or the sampling fractions are
very small (say less than 5%) then Ben doesn't need an FPC in the first
stage.
--Jeff
jpitblado@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/