Re: st: Weights in survey design

 From "Stas Kolenikov" To statalist@hsphsun2.harvard.edu Subject Re: st: Weights in survey design Date Sun, 18 Mar 2007 21:34:58 -0500

```First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.

Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.

I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.

On 3/18/07, Jason Ferris <J.Ferris@latrobe.edu.au> wrote:
```
```I have a large dataset with weights calculated as PPS based on household
size, stratified by sex.  The age group respondents are from 16-64.

I am interested in looking at data only from those aged 16-24.  I can
use the subpop command "subpop(if age>=16 & age<=24)" for all the
commands.  But I am wondering if I can drop all other cases (keep if
age>=16 & age<=24) and the 'reset' my weights based only on those aged
16-24.

In the original form (with all data) I have the following summary data:
(note the survey design is quiet a simple one)

Svyset

pweight: pps

VCE: linearized

Strata 1: sex

SU 1: <observations>

FPC 1: <zero>

. svy: tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

Design df          =      8662

-----------------------

sex | proportions

----------+------------

female |       .5046

male |       .4954

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

If I select the subgroup (age 16-24):

. svy,subpop(if age<=24): tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

Subpop. no. of obs =       999

Subpop. size       = 1438.7586

Design df          =      8662

-----------------------

sex | proportions

----------+------------

female |       .4599

male |       .5401

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

When I reset my weights with data only representing those 16-24 years of
age (ie., as if this was the way I original designed my study) I get the
following results:

. svy: tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =       999

Number of PSUs     =       999        Population size    =       999

Design df          =       997

-----------------------

sex | proportions

----------+------------

female |       .4655

male |       .5345

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

As it can be seen there is now a difference in the proportions between
using subpop and resetting my weights.  Is this a problem?

Jason

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```
--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```