# st: RE: Weights in survey design

 From "Carter Rees" To Subject st: RE: Weights in survey design Date Sun, 18 Mar 2007 22:40:50 -0400

```Jason,

There is a very important difference in using the subpop option of -svy- vs.
using an -if- statement to drop cases.

A good link to get you started:
http://www.cpc.unc.edu/services/computer/presentations/statatutorial/example
33.html

I also recommend Stata's Survey Data manual.

HTH,

Carter

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Jason Ferris
Sent: Sunday, March 18, 2007 6:47 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: Weights in survey design

I have a large dataset with weights calculated as PPS based on household
size, stratified by sex.  The age group respondents are from 16-64.

I am interested in looking at data only from those aged 16-24.  I can
use the subpop command "subpop(if age>=16 & age<=24)" for all the
commands.  But I am wondering if I can drop all other cases (keep if
age>=16 & age<=24) and the 'reset' my weights based only on those aged
16-24.

In the original form (with all data) I have the following summary data:
(note the survey design is quiet a simple one)

Svyset

pweight: pps

VCE: linearized

Strata 1: sex

SU 1: <observations>

FPC 1: <zero>

. svy: tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

Design df          =      8662

-----------------------

sex | proportions

----------+------------

female |       .5046

male |       .4954

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

If I select the subgroup (age 16-24):

. svy,subpop(if age<=24): tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

Subpop. no. of obs =       999

Subpop. size       = 1438.7586

Design df          =      8662

-----------------------

sex | proportions

----------+------------

female |       .4599

male |       .5401

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

When I reset my weights with data only representing those 16-24 years of
age (ie., as if this was the way I original designed my study) I get the
following results:

. svy: tab sex

(running tabulate on estimation sample)

Number of strata   =         2        Number of obs      =       999

Number of PSUs     =       999        Population size    =       999

Design df          =       997

-----------------------

sex | proportions

----------+------------

female |       .4655

male |       .5345

|

Total |           1

-----------------------

Key:  proportions  =  cell proportions

As it can be seen there is now a difference in the proportions between
using subpop and resetting my weights.  Is this a problem?

Jason

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```