Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights in survey design

From   Steven Samuels <>
Subject   Re: st: Weights in survey design
Date   Wed, 21 Mar 2007 17:03:33 -0400


0. Your resetting of the weights, as you describe, caused the discrepancy you first asked about. You should leave the original weights alone (see 5 below).

1. RDD describes a lot of different designs, so your description is incomplete. Often RDD are based on geographic area or on likelihood that a number is a household. Sometimes there is a list which has been purified so that only HH are likely to be on the list. Alternatively, some form of Mitofsky-Waksberg adaptive sampling is used, in which case PSU's are banks of phone numbers. Your phrase "without geographical weighting" is not descriptive-the question: were separate samples were drawn within strata; if they were, first stage weights can still be identical-that is, no special stratum weighting is necessary.

2. The PSU is not "_n" or the final stage person. It will be household ID or something else like bank of telephone numbers.

3. Your use of the term "PPS" is incorrect.-this refers to methods of sampling with probability proportional to "size" of a unit. In your case, your sampling within HH was inversely proportional to the number of eligible men and women.

4. In telephone surveys is standard procedure to adjust the weights for the number of telephones in the HH. HH with more telephones have greater probability of being selected. . With K telephones, divide the original samnpling weight by K.

5. Resetting the weights, as you did is incorrect. If you restrict respondents to a particular age range, the original sampling probabilities (adjusted for no. of men and women) still apply.

6. In Stata you can specify that there was stratification by gender at thesecond stage of sampling (assuming that HH are the PSU's). See page 251 of the "Survey Data" manual for Release 9. But you should NOT do this here: you have only one observation per stratum. You gain nothing and Stata will complain. Use the original weight adjusted for men and women to account for the sampling scheme.

7. It is highly unusual in a national survey of this size NOT to post-stratify or "rake" so as to more closely match sample results to census distributions of age, gender, household size.

I think that you need help here. There are many fine survey statisticians in Australia. I f your local stats department can't help you, you might look up Professor Ken Brewer at ANU and ask him for a recommendation.


On Mar 20, 2007, at 7:14 PM, Jason Ferris wrote:

Hi Stas and Steven,
I do see your point that the original weights are wrong. This data is
the first panel of a longitudinal study and as such, I set the weights
to match the population that we are going to follow (i.e., 8664 - using
probability proportional to size PPS). I guess what you are saying is
that for any cross-sectional analysis - the population size would be all
those between 16-64 in Australia (where I am)? Would that be correct?

As for our sampling method (In response to Steven Samuels) we had two
questionnaires one for males and one for females. We RDD households
across Australia (without geographical weighting). Once we got a
household we determined the number of men/women 16 to 64 in the
household and randomly selected one. In this many the weighting of the
respondent was determined by the number of men/women also in the
household, and the stratified selection of the household was based on
sex (given that an interviewer with a male questionnaire would only ask
a household about the males in the house).

As for post-stratification of weights and ranking?
This was not done, as I believed that setting the survey design
(pre-analysis) to match the sampling method would be appropriate.

In term of resetting the weights? (and the data provided in the original

As I had the household size of men and women to create the weights (of
8664) I would then drop the desired cases (those I didn't want - in the
example those 25 and over). Then based on the number of men/women in
the households left (i.e., those 16-24) I would re-calculate the weights
using the PPS method only for the households left.

Cheers for your input

Stas wrote:

First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.

Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.

I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.

Steven wrote:
population, the survey design and what were the primary and (possibly)
second stage and later sampling units?
In a HH or telephone survey, ordinarily the PSU's would be some kind of
geographic areas, and the sampling strata for PSU's cannot be sex, as
your setup implies.

Other questions: were the weights post-stratified or raked in any way to
reflect the population totals? How did you "reset" the weights?


* For searches and help try:
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index