Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights in survey design


From   "Stas Kolenikov" <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Weights in survey design
Date   Sun, 18 Mar 2007 21:34:58 -0500

First of all, your original weights are wrong, anyway: they should add
up to the population size, may be up to some sampling variability. If
you have 8664 units and your population size is 8664, it means that
you have a census! If you are dealing with ratios and regression
models, the issue isn't of great importance, but you still would want
to have everything implemented properly.

Then by dropping units, you are confusing the software in terms of
thinking how the data were collected. In particular, the pairwise
probabilities of selection (leading to the variances of the estimates)
will be way off (and so will your variance). If you had some nice
design (with some sort of proportional allocations, etc.), then those
properties will be lost, the cluster sizes will get wrong, etc. DON'T
DO THAT, as the bottom line.

I am pretty sure there are other, and better, explanations in the
[SVY] manual. Also, the FAQ
http://www.stata.com/support/faqs/stat/zerowgt.html might be helpful,
as essentially with the -subpop- analysis, you are zeroing out the
weights for non-subpopulation units.

On 3/18/07, Jason Ferris <J.Ferris@latrobe.edu.au> wrote:
I have a large dataset with weights calculated as PPS based on household
size, stratified by sex.  The age group respondents are from 16-64.

I am interested in looking at data only from those aged 16-24.  I can
use the subpop command "subpop(if age>=16 & age<=24)" for all the
commands.  But I am wondering if I can drop all other cases (keep if
age>=16 & age<=24) and the 'reset' my weights based only on those aged
16-24.

In the original form (with all data) I have the following summary data:
(note the survey design is quiet a simple one)

Svyset

      pweight: pps

          VCE: linearized

     Strata 1: sex

         SU 1: <observations>

        FPC 1: <zero>

. svy: tab sex

(running tabulate on estimation sample)



Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

                                      Design df          =      8662



-----------------------

      sex | proportions

----------+------------

   female |       .5046

     male |       .4954

          |

    Total |           1

-----------------------

  Key:  proportions  =  cell proportions



If I select the subgroup (age 16-24):

. svy,subpop(if age<=24): tab sex

(running tabulate on estimation sample)



Number of strata   =         2        Number of obs      =      8664

Number of PSUs     =      8664        Population size    =      8664

Subpop. no. of obs =       999

Subpop. size       = 1438.7586

Design df          =      8662



-----------------------

      sex | proportions

----------+------------

   female |       .4599

     male |       .5401

          |

    Total |           1

-----------------------

  Key:  proportions  =  cell proportions





When I reset my weights with data only representing those 16-24 years of
age (ie., as if this was the way I original designed my study) I get the
following results:



. svy: tab sex

(running tabulate on estimation sample)



Number of strata   =         2        Number of obs      =       999

Number of PSUs     =       999        Population size    =       999

Design df          =       997



-----------------------

      sex | proportions

----------+------------

   female |       .4655

     male |       .5345

          |

    Total |           1

-----------------------

  Key:  proportions  =  cell proportions



As it can be seen there is now a difference in the proportions between
using subpop and resetting my weights.  Is this a problem?



 Jason


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index