Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Svy mean using subpop and incorrect number of observations


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Svy mean using subpop and incorrect number of observations
Date   Thu, 17 Dec 2009 15:23:53 -0600

Heather E. Ridolfo <evd7@CDC.GOV> asks about a Statalist exchange we had in
July of last year:

> I posted a message in July 2008 about problems I was encountering when
> using svy mean:
> "Using svy: mean- with option -subpop() I noticed that it is reporting a
> smaller estimation sample than the number of observations in my
> dataset."
> 
> The reply I got back said:
> "We have verified that -svy: mean- is incorrectly dropping out-of-subpop
> observations that contain missing values in the variables of the
> varlist. The only other affected commands are -svy: proportion-, -svy:
> ratio-, and -svy: total-. We hope to have this fixed in the next Stata
> update (within the next few weeks)" 
> 
> However, I continue to experience this problem a year and half later
> when trying to run the following command:
> Svyset PSU [pweight = nweight], strata(STRATUM) singleunit(centered)
> Svy, subpop(allsp): mean RA sevimpft ADLS IADLS help UseAD
> 
> The number of observations I get back is smaller than the number of
> actual observation in the dataset. I am using Stata 10 and as far as I
> can tell it's up-to-date. 
> 
> Does anyone have any suggestions on how I can fix this problem? 

In the Stata 10 whatsnew, the update on 18aug2009 contains the following item:

48.  svy: mean, svy: proportion, svy: ratio, and svy: total would
     mark out observations with missing values in the summary
     variables even when the sampling weight was zero, which is a
     surrogate for identifying out-of-subpopulation observations.
     This has been fixed.

Given Heather's example, -svy- will drop observations containing missing
values in any of the following variables:

	PSU
	nweight
	STRATUM

-svy- will then only check the following variables for missing values within
the subpopulation observations:

	RA
	sevimpft
	ADLS
	IADLS
	help
	UseAD

The following simple example illustrates that -svy- is only dropping
observations with missing values within the subpopulation.

	. sysuse auto
	. tabulate rep78 foreign, missing nolabel
	. svyset _n
	. svy, subpop(if for==0): mean rep78
	. svy, subpop(if for==1): mean rep78

In the following output from Stata 10, -tabulate- shows that -rep78- is
missing in 5 observatsion, 4 observations where foreign=0 and 1 observation
where foreign=1.  The two calls to -svy: mean- show that the sample size is 70
and 73, respectively.

***** BEGIN:
. sysuse auto
(1978 Automobile Data)

. tabulate rep78 foreign, missing nolabel

    Repair |
    Record |       Car type
      1978 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
         . |         4          1 |         5 
-----------+----------------------+----------
     Total |        52         22 |        74 


. svyset _n

      pweight: <none>
          VCE: linearized
  Single unit: missing
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: <zero>

. svy, subpop(if for==0): mean rep78
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =      70
Number of PSUs   =      70          Population size  =      70
                                    Subpop. no. obs  =      48
                                    Subpop. size     =      48
                                    Design df        =      69

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       rep78 |   3.020833   .1205044      2.780434    3.261233
--------------------------------------------------------------

. svy, subpop(if for==1): mean rep78
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =      73
Number of PSUs   =      73          Population size  =      73
                                    Subpop. no. obs  =      21
                                    Subpop. size     =      21
                                    Design df        =      72

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       rep78 |   4.285714   .1537776      3.979164    4.592264
--------------------------------------------------------------
***** END:

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index