Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Negative deff values in survey analysis


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Negative deff values in survey analysis
Date   Wed, 03 Jun 2009 23:15:59 -0500

Angel Rodriguez-Laso <angelrlaso@gmail.com>

> I'm confused with the following results:
> 
> 
> 
> . svyset psu [pweight=weight2007], strata(healtharea)fpc(psusperhealtharea)
> 
>       pweight: weight2007
>           VCE: linearized
>      Strata 1: healtharea
>          SU 1: psu
>         FPC 1: psusperhealtharea
> 
> .
> end of do-file
> 
> . svy: tab p29, deff deft
> (running tabulate on estimation sample)
> 
> Number of strata   =        11                  Number of obs      =     12140
> Number of PSUs     =      1266                  Population size    = 12134,139
>                                                 Design df          =      1255
> 
> -------------------------------------------------
> Any permanent
> disability | proportions         deff         deft
> ----------+--------------------------------------
>     0, no |       ,8887        -1981        ,9783
>     1, yes |       ,1113        -1981        ,9783
>           |
>     Total |           1
> -------------------------------------------------
>   Key:  proportions  =  cell proportions
>         deff         =  deff for variances of cell proportions
>         deft         =  deft for variances of cell proportions
> 
> 
> 
> 
> Why do I get large negative deff values? Deft resembles more what I
> was expecting, but it should be the square root of deff and obviously
> this is not the case. Do you have any explanation for these results?

Stas Kolenikov <skolenik@gmail.com> already pointed out that the sampling
weights appear to be normalized by the sample size.  In fact, the sum of the
weights is less than the sample size.  When the first stage is sampled without
replacement (i.e. the 'fpc()' in the above -svyset-), the 'deff' calculation
is

	deff = V_db / (1-n/W) V_srswr

where 'V_db' is the design based variance estimate, 'V_srswr' is simple
randome sample with replacement variance estimate, 'n' is the sample size, and
'W' is an estimate for the population size.  Here 'W' is the sum of the
sampling weights.  Since Angel's sampling weights are normalized, they cannot
be used to estimate the population size, thus the above 'deff' calculation is
not valid.  Without knowing what population size, we can't compute a valid
'deff' statistic.

On the other hand, the 'deft' calculation is

	deft = sqrt( V_db / V_srswr )

which does not need an estimate of the population size, and thus will always
produce a valid value.

We will look into changing -svy: tabulate- and -estat effects- to report
missing values for 'deff' in the case where the 'W' calculation is less than
or equal to 'n'.

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index