[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Survey statistics, sampling methods

From   "Stas Kolenikov" <>
Subject   Re: st: Survey statistics, sampling methods
Date   Thu, 30 Aug 2007 23:02:35 -0500


The PSU is the university, but the weight is 1/# of universities in
that stratum, and the fpc is again the number of universities in the
stratum. The results can only be generalized to those research 1 / NIH
top universities. You would think of your university as selected with
certainty, so it will have a weight of one, and a stratum of its own.
Departments are secondary units, and as long as you put those
departments together for sampling the individuals, it probably does
not matter that you tried to sample them from two strata --
stratification does not matter much beyond the first stage, anyway.
The fpc for the second stage are for the total numbers of departments
in each of those particular schools. The position is the
stratification at the third stage of sampling, but again it is not
terribly important.

Now the real funny thing about your design is that you cannot estimate
the variances the way you set your sampling process up: the designs
with 1 PSU/stratum are the ones that take the most benefits out of
stratification, but the variances cannot be estimated unbiasedly. Any
survey statistician would have told you that upfront... that's the
reason statisticians hate you when you bring your data when it's too
late to do anything about it! The easiest solution is to disregard the
stratification (keeping the weights though) -- that would give you 6
d.f.s for your design, so you can estimate models with 6 explanatory
variables... if you want to go any further, you would need to
sacrifice the adequacy of your -svyset- even further say by setting
your departments as PSUs, however wrong that would seem. Korn &
Graubard (1995 JRSS A, doi:10.2307/2983292) discussed some strategies
for designs with deficient d.f.s.

I am not really sure whether Stanford has any survey statisticians to
help you out much with your stuff, unfortunately.

On 8/30/07, Jen McCormick <> wrote:
> I have another set of questions (probably pretty simple) with regards to
> analyzing survey data and the use of the svyset command. I am largely
> concerned that I am not "naming" the steps we took in our sampling
> correctly in terms used in the svyset command.

Stas Kolenikov, also found at
Small print: Please do not reply to my Gmail address as I don't check
it regularly.
*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index