Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Poststratification weighting, subpop, and missing values


From   <[email protected]>
To   <[email protected]>
Subject   st: Poststratification weighting, subpop, and missing values
Date   Wed, 26 Sep 2012 09:25:55 -0400

Hi everyone,
I'm currently working on analyzing the results of a survey and have run into some strange results when using poststratification weights and the subpop modifier.  An example is shown below, where we're simply totaling 2011 sales.  The flag variable indicates the subpopulation we're interested in.  When only limiting the population by flag, the command calculates the total over 2,624 PSUs, while when we try and further limit the population to those with flag equal to one and where total sales is not missing, it calculates over 2,639 PSUs.  In the second command, STATA  seems to be including the 15 missing values in its calculations.   Also, the total for the more limited subpopulation is lower, which does not coincide with what we expect to happen when removing missing values and its effect on the background calculation of the adjusted weight.

Could someone shed some light on why this is happening?

Thank you,
Ricky Ubee




. svyset uniqueID [pweight=weight_prop], strata(strata2) singleunit(scaled) poststrata(type2) postweight(postwt4) fpc(N)

      pweight: weight_prop
          VCE: linearized
   Poststrata: type2
   Postweight: postwt4
  Single unit: scaled
     Strata 1: strata2
         SU 1: uniqueID
        FPC 1: N


. svy, subpop(if flag==1): total TOT_SALES_11
(running total on estimation sample)

Survey: Total estimation

Number of strata =      26          Number of obs    =    2624
Number of PSUs   =    2624          Population size  =   23794
N. of poststrata =      16          Subpop. no. obs  =     652
                                    Subpop. size     = 5245.94
                                    Design df        =    2598

--------------------------------------------------------------
                               |             Linearized
                               |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
TOT_SALES_11 |   2.20e+12   2.77e+11      1.65e+12    2.74e+12
--------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation
      members.

. svy, subpop(if flag==1 & TOT_SALES_11~=.): total TOT_SALES_11
(running total on estimation sample)

Survey: Total estimation

Number of strata =      26          Number of obs    =    2639
Number of PSUs   =    2639          Population size  =   23794
N. of poststrata =      16          Subpop. no. obs  =     652
                                    Subpop. size     = 5222.38
                                    Design df        =    2613

--------------------------------------------------------------
                               |             Linearized
                               |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
TOT_SALES_11 |   2.18e+12   2.76e+11      1.64e+12    2.72e+12
--------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation
      members.

	  
. count if flag==1 & TOT_SALES_11==.
   15

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index