Hi Stas and Steven,
I do see your point that the original weights are wrong. This data is
the first panel of a longitudinal study and as such, I set the weights
to match the population that we are going to follow (i.e., 8664 - using
probability proportional to size PPS). I guess what you are saying is
that for any cross-sectional analysis - the population size would be all
those between 16-64 in Australia (where I am)? Would that be correct?
As for our sampling method (In response to Steven Samuels) we had two
questionnaires one for males and one for females. We RDD households
across Australia (without geographical weighting). Once we got a
household we determined the number of men/women 16 to 64 in the
household and randomly selected one. In this many the weighting of the
respondent was determined by the number of men/women also in the
household, and the stratified selection of the household was based on
sex (given that an interviewer with a male questionnaire would only ask
a household about the males in the house).
As for post-stratification of weights and ranking?
This was not done, as I believed that setting the survey design
(pre-analysis) to match the sampling method would be appropriate.
In term of resetting the weights? (and the data provided in the original
post):
As I had the household size of men and women to create the weights (of
8664) I would then drop the desired cases (those I didn't want - in the
example those 25 and over). Then based on the number of men/women in
the households left (i.e., those 16-24) I would re-calculate the weights
using the PPS method only for the households left.
Cheers for your input
Jason
Stas wrote:
First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.
Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.
I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.
Steven wrote:
population, the survey design and what were the primary and (possibly)
second stage and later sampling units?
In a HH or telephone survey, ordinarily the PSU's would be some kind of
geographic areas, and the sampling strata for PSU's cannot be sex, as
your setup implies.
Other questions: were the weights post-stratified or raked in any way to
reflect the population totals? How did you "reset" the weights?
Steven
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/