First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.
Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.
I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the subpop analysis, you are zeroing out the
weights for nonsubpopulation units.
On 3/18/07, Jason Ferris <J.Ferris@latrobe.edu.au> wrote:
I have a large dataset with weights calculated as PPS based on household
size, stratified by sex. The age group respondents are from 1664.
I am interested in looking at data only from those aged 1624. I can
use the subpop command "subpop(if age>=16 & age<=24)" for all the
commands. But I am wondering if I can drop all other cases (keep if
age>=16 & age<=24) and the 'reset' my weights based only on those aged
1624.
In the original form (with all data) I have the following summary data:
(note the survey design is quiet a simple one)
Svyset
pweight: pps
VCE: linearized
Strata 1: sex
SU 1: <observations>
FPC 1: <zero>
. svy: tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 8664
Number of PSUs = 8664 Population size = 8664
Design df = 8662

sex  proportions
+
female  .5046
male  .4954

Total  1

Key: proportions = cell proportions
If I select the subgroup (age 1624):
. svy,subpop(if age<=24): tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 8664
Number of PSUs = 8664 Population size = 8664
Subpop. no. of obs = 999
Subpop. size = 1438.7586
Design df = 8662

sex  proportions
+
female  .4599
male  .5401

Total  1

Key: proportions = cell proportions
When I reset my weights with data only representing those 1624 years of
age (ie., as if this was the way I original designed my study) I get the
following results:
. svy: tab sex
(running tabulate on estimation sample)
Number of strata = 2 Number of obs = 999
Number of PSUs = 999 Population size = 999
Design df = 997

sex  proportions
+
female  .4655
male  .5345

Total  1

Key: proportions = cell proportions
As it can be seen there is now a difference in the proportions between
using subpop and resetting my weights. Is this a problem?
Jason
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/