[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Weights in survey design
0. Your resetting of the weights, as you describe, caused the
discrepancy you first asked about. You should leave the original
weights alone (see 5 below).
1. RDD describes a lot of different designs, so your description is
incomplete. Often RDD are based on geographic area or on likelihood
that a number is a household. Sometimes there is a list which has
been purified so that only HH are likely to be on the list.
Alternatively, some form of Mitofsky-Waksberg adaptive sampling is
used, in which case PSU's are banks of phone numbers. Your phrase
"without geographical weighting" is not descriptive-the question:
were separate samples were drawn within strata; if they were, first
stage weights can still be identical-that is, no special stratum
weighting is necessary.
2. The PSU is not "_n" or the final stage person. It will be
household ID or something else like bank of telephone numbers.
3. Your use of the term "PPS" is incorrect.-this refers to methods
of sampling with probability proportional to "size" of a unit. In
your case, your sampling within HH was inversely proportional to the
number of eligible men and women.
4. In telephone surveys is standard procedure to adjust the
weights for the number of telephones in the HH. HH with more
telephones have greater probability of being selected. . With K
telephones, divide the original samnpling weight by K.
5. Resetting the weights, as you did is incorrect. If you
restrict respondents to a particular age range, the original sampling
probabilities (adjusted for no. of men and women) still apply.
6. In Stata you can specify that there was stratification by gender
at thesecond stage of sampling (assuming that HH are the PSU's). See
page 251 of the "Survey Data" manual for Release 9. But you should
NOT do this here: you have only one observation per stratum. You gain
nothing and Stata will complain. Use the original weight adjusted
for men and women to account for the sampling scheme.
7. It is highly unusual in a national survey of this size NOT to
post-stratify or "rake" so as to more closely match sample results to
census distributions of age, gender, household size.
I think that you need help here. There are many fine survey
statisticians in Australia. I f your local stats department can't
help you, you might look up Professor Ken Brewer at ANU and ask him
for a recommendation.
On Mar 20, 2007, at 7:14 PM, Jason Ferris wrote:
Hi Stas and Steven,
I do see your point that the original weights are wrong. This data is
the first panel of a longitudinal study and as such, I set the weights
to match the population that we are going to follow (i.e., 8664 -
probability proportional to size PPS). I guess what you are saying is
that for any cross-sectional analysis - the population size would
those between 16-64 in Australia (where I am)? Would that be correct?
As for our sampling method (In response to Steven Samuels) we had two
questionnaires one for males and one for females. We RDD households
across Australia (without geographical weighting). Once we got a
household we determined the number of men/women 16 to 64 in the
household and randomly selected one. In this many the weighting of
respondent was determined by the number of men/women also in the
household, and the stratified selection of the household was based on
sex (given that an interviewer with a male questionnaire would only
a household about the males in the house).
As for post-stratification of weights and ranking?
This was not done, as I believed that setting the survey design
(pre-analysis) to match the sampling method would be appropriate.
In term of resetting the weights? (and the data provided in the
As I had the household size of men and women to create the weights (of
8664) I would then drop the desired cases (those I didn't want - in
example those 25 and over). Then based on the number of men/women in
the households left (i.e., those 16-24) I would re-calculate the
using the PPS method only for the households left.
Cheers for your input
First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.
Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.
I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.
population, the survey design and what were the primary and (possibly)
second stage and later sampling units?
In a HH or telephone survey, ordinarily the PSU's would be some
geographic areas, and the sampling strata for PSU's cannot be sex, as
your setup implies.
Other questions: were the weights post-stratified or raked in any
reflect the population totals? How did you "reset" the weights?
* For searches and help try:
* For searches and help try: